Yukatan data model 0.6 Yukatan data model Yukatan data model 0.8

Yukatan data model 0.7

The version 0.6 of the Yukatan data model took the first step towards MIME conformance by specifying the character set conventions used by the Yukatan database. The next steps on this road are:

This version of the Yukatan data model addresses the first of these concerns, while the rest are addressed in later versions.

MIME entities

The RFC 2045 (MIME Part One: Format of Internet Message Bodies), defines entities as follows (section "2.4. Entity"):

The term "entity", refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity.

An entity is a combination of header fields and content. The definition resembles the RFC 2822 definition of email messages, that has been used as the basis of the message data model since version 0.2 of the Yukatan data model. In fact almost all of the Yukatan database structures can easily be adapted to refer to entities instead of messages. A pre-MIME email message can easily be represented as a single-entity MIME message without losing any information.

Entity trees

The difficulty with MIME entities comes from the fact that entities can be nested in a tree structure. A top-level email message consists of one or more entities that may even include other email messages as nested entities.

Tree structures have traditionally been difficult to express efficiently in relational databases. A better way to model the entity tree is to express messages as flattened lists of entities and use an additional field for the tree structure. This solution can be implemented as follows:

CREATE TABLE message (
        msgno           INTEGER PRIMARY KEY
);
CREATE TABLE entity (
        msgno           INTEGER NOT NULL
                        REFERENCES message(msgno)
                        ON UPDATE CASCADE ON DELETE CASCADE,

        entno           INTEGER NOT NULL
                        CHECK (entno >= 0),
        PRIMARY KEY (msgno, entno),

        entparentno     INTEGER,
        CHECK (entparentno IS NULL OR entno > entparentno)
        FOREIGN KEY (msgno, entparentno)
                        REFERENCES entity(msgno, entno)
                        ON UPDATE CASCADE ON DELETE CASCADE
);

Top-level messages are identified by a single msgno number. The entities of a message are stored in the entity relation with a running entity number entno, that starts from zero and is incremented for each entity. A single entity is uniquely identified by the combination of the msgno and entno numbers.

The root entity of a message can be retrieved with:

SELECT * FROM entity WHERE msgno=? AND entno=0

All the entities of a message can be retrieved with:

SELECT * FROM entity WHERE msgno=? ORDER BY entno

The entparentno attribute is used to express the entity tree structure. The CHECK and FOREIGN KEY constraints guarantee that the entparentno attribute of the root entity (entno=0) is NULL, and that all other entities of a message are linked to parent entities with smaller entity numbers. By induction this guarantees that the entities of a message are organized in a clean tree structure, whose root is the entity with entno=0.

Entity header fields and contents

Each entity has its own set of header fields and the top-level header fields of a message are associated with the root entity. Therefore we can (and should) link each header field to an entity record instead of the top-level message.

This change is quite straightforward, we just need add the entity number into each one of the many header field relations included in the Yukatan data model. The change affects all relations with the following structure:

CREATE TABLE ___field (
        msgno           INTEGER NOT NULL
                        REFERENCES message
                        ON UPDATE CASCADE ON DELETE CASCADE,
        ___no           INTEGER NOT NULL
                        CHECK (___no >= 0),
        ...
        PRIMARY KEY (msgno, ___no)
);

The updated structure is:

CREATE TABLE ___field (
        msgno           INTEGER NOT NULL,
        entno           INTEGER NOT NULL,
        FOREIGN KEY (msgno, entno)
                        REFERENCES entity(msgno, entno)
                        ON UPDATE CASCADE ON DELETE CASCADE,
        ___no           INTEGER NOT NULL
                        CHECK (___no >= 0),
        ...
        PRIMARY KEY (msgno, entno, ___no)
);

Also the possible text body of the entity and the parsed date, sender, message ID, and subject header values should be linked to entities instead of the top-level messages. To achieve this, the attributes msgdate, msgsendername, msgsenderaddress, msgmessageid, msgsubject, and msgbody are moved to the entity relation and the "msg" prefix is changed to "ent".

The only "msg" attributes remaining in the message relation are the message serial number msgno and the message source msgsource. The msgtype and msgtimestamp attributes are renamed to envtype and envtimestamp to reflect the fact that they are part of the envelope data and not an integral part of the actual message.

SQL schema

The full SQL schema of the Yukatan data model 0.7 is included as the attached SQL schema file.

The schema files are from now onlicensed under the GPL license. You are free to use and extend the schema as long as you make your changes freely available under the GPL.

The license places no restrictions on external programs that read or write the data in a Yukatan database. Such programs are "independed and separate works" as defined in section 2 of the GPL.

Additional notes

The msgno attribute of the message relation is no longer defined using the SERIAL pseudo-type. Instead a msgno_sequence sequence is explicitly defined to clarify the source of the default message serial numbers.

SourceForge.net Logo