This version of the Yukatan data model addresses the first of these concerns, while the rest are addressed in later versions.
The RFC 2045 (MIME Part One: Format of Internet Message Bodies), defines entities as follows (section "2.4. Entity"):
The term "entity", refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in the body of a multipart entity.
An entity is a combination of header fields and content. The definition resembles the RFC 2822 definition of email messages, that has been used as the basis of the message data model since version 0.2 of the Yukatan data model. In fact almost all of the Yukatan database structures can easily be adapted to refer to entities instead of messages. A pre-MIME email message can easily be represented as a single-entity MIME message without losing any information.
The difficulty with MIME entities comes from the fact that entities can be nested in a tree structure. A top-level email message consists of one or more entities that may even include other email messages as nested entities.
Tree structures have traditionally been difficult to express efficiently in relational databases. A better way to model the entity tree is to express messages as flattened lists of entities and use an additional field for the tree structure. This solution can be implemented as follows:
CREATE TABLE message ( msgno INTEGER PRIMARY KEY ); CREATE TABLE entity ( msgno INTEGER NOT NULL REFERENCES message(msgno) ON UPDATE CASCADE ON DELETE CASCADE, entno INTEGER NOT NULL CHECK (entno >= 0), PRIMARY KEY (msgno, entno), entparentno INTEGER, CHECK (entparentno IS NULL OR entno > entparentno) FOREIGN KEY (msgno, entparentno) REFERENCES entity(msgno, entno) ON UPDATE CASCADE ON DELETE CASCADE );
Top-level messages are identified by a single msgno number. The entities of a message are stored in the entity relation with a running entity number entno, that starts from zero and is incremented for each entity. A single entity is uniquely identified by the combination of the msgno and entno numbers.
The root entity of a message can be retrieved with:
SELECT * FROM entity WHERE msgno=? AND entno=0
All the entities of a message can be retrieved with:
SELECT * FROM entity WHERE msgno=? ORDER BY entno
The entparentno attribute is used to express the entity tree structure. The CHECK and FOREIGN KEY constraints guarantee that the entparentno attribute of the root entity (entno=0) is NULL, and that all other entities of a message are linked to parent entities with smaller entity numbers. By induction this guarantees that the entities of a message are organized in a clean tree structure, whose root is the entity with entno=0.
Each entity has its own set of header fields and the top-level header fields of a message are associated with the root entity. Therefore we can (and should) link each header field to an entity record instead of the top-level message.
This change is quite straightforward, we just need add the entity number into each one of the many header field relations included in the Yukatan data model. The change affects all relations with the following structure:
CREATE TABLE ___field ( msgno INTEGER NOT NULL REFERENCES message ON UPDATE CASCADE ON DELETE CASCADE, ___no INTEGER NOT NULL CHECK (___no >= 0), ... PRIMARY KEY (msgno, ___no) );
The updated structure is:
CREATE TABLE ___field ( msgno INTEGER NOT NULL, entno INTEGER NOT NULL, FOREIGN KEY (msgno, entno) REFERENCES entity(msgno, entno) ON UPDATE CASCADE ON DELETE CASCADE, ___no INTEGER NOT NULL CHECK (___no >= 0), ... PRIMARY KEY (msgno, entno, ___no) );
Also the possible text body of the entity and the parsed date, sender, message ID, and subject header values should be linked to entities instead of the top-level messages. To achieve this, the attributes msgdate, msgsendername, msgsenderaddress, msgmessageid, msgsubject, and msgbody are moved to the entity relation and the "msg" prefix is changed to "ent".
The only "msg" attributes remaining in the message relation are the message serial number msgno and the message source msgsource. The msgtype and msgtimestamp attributes are renamed to envtype and envtimestamp to reflect the fact that they are part of the envelope data and not an integral part of the actual message.
The full SQL schema of the Yukatan data model 0.7 is included as the attached SQL schema file.
The schema files are from now onlicensed under the GPL license. You are free to use and extend the schema as long as you make your changes freely available under the GPL.
The license places no restrictions on external programs that read or write the data in a Yukatan database. Such programs are "independed and separate works" as defined in section 2 of the GPL.
The msgno attribute of the message relation is no longer defined using the SERIAL pseudo-type. Instead a msgno_sequence sequence is explicitly defined to clarify the source of the default message serial numbers.