After solving the main problem of representing MIME entity trees in the version 0.7 of the Yukatan data model we can now start adding more details to our bare-bones MIME model. This version adds support for the MIME content types, content identifiers, and content descriptions.
One of the most important parts of the MIME standard is the concept of media types. Media types and the Content-Type header were first introduced in RFC 1049 and then generalized in the MIME RFC 2045, section 5., that says:
The purpose of the Content-Type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner. The value in this field is called a media type.
Media types consist of a top-level type identifier, a subtype identifier, and an optional set of named parameter values. The top-level type identifier ("text", "image", "multipart", etc.) determines the general type of the entity content, and the subtype identifier ("plain", "jpg", "digest", etc.) is used to specify the actual type of the entity content. The parameters are mostly related to low-level issues like character sets and multipart boundaries that are handled before the message is stored in the Yukatan database. For now we will only be interested in the top-level and subtype identifiers.
The type identifiers are stored as two attributes of the entity relation:
CREATE TABLE entity ( ... enttypemajor CHARACTER VARYING DEFAULT 'text' NOT NULL CHECK (LOWER(enttypemajor) = enttypemajor), enttypeminor CHARACTER VARYING DEFAULT 'plain' NOT NULL CHECK (LOWER(enttypeminor) = enttypeminor), ... );
The type identifiers are constrained to be NOT NULL because each message body should always have a content type. Additionally the case-insensitive type identifiers must always be normalized to lower case to make them easiert to handle. The default type "text/plain" should be used if the Content-Type header is not present. Note also that the actual values of the type attributes are not constrained to a predefined selection of type identifiers. It is the task of the database clients to assign meaning to the the type identifiers stored in the database.
The conventional "type/subtype" notation is not used because the top-level type identifier is useful as a separate value.
So far the Yukatan data model has only been able to store textual entity bodies. Now that we have added support for storing the media type we should also make it possible to store the data of the binary media types. To achieve this we will add a new entity attribute entdata for storing the binary contents of the non-text entities. The previous entbody attribute will also be renamed to enttext to better match the semantics of the text field.
CREATE TABLE entity ( ... enttext TEXT, entdata BYTEA );
The contents and semantics of the attributes are determined based on wich attributes are NULL:
enttext IS NOT NULL AND entdata IS NULL
enttext IS NULL AND entdata IS NOT NULL
enttext IS NOT NULL AND entdata IS NOT NULL
enttext IS NULL AND entdata IS NULL
While the different combinations have quite standard relationships with the various media types, we still won't set explicit table constraints to govern these relationships. The reason for this is that the set of media types is not complete, and future standards might define new media types that would contradict these constraints. In this case it is better to leave the interpretation of the data to the client programs.
The MIME standard defines the content identifier as an entity-level identifier to be used like the message identifier defined in RFC 822. The content identifier and the Content-ID header field are defined in RFC 2045, section 7.
The content identifiers are stored in the Yukatan database just like the previously defined message identifiers:
CREATE TABLE entity ( ... entmessageid CHARACTER VARYING, entcontentid CHARACTER VARYING, ... );
As a final step to fully support RFC 2045 we will add support for the Content-Description header field defined in section 8 of the RFC:
The ability to associate some descriptive information with a given body is often desirable. For example, it may be useful to mark an "image" body as "a picture of the Space Shuttle Endeavor." Such text may be placed in the Content-Description header field. This header field is always optional.
The optional description is stored as a RFC 2047 decoded Unicode string in the entdescription attribute:
CREATE TABLE entity ( ... entdescription CHARACTER VARYING, ... );
The full SQL schema of the Yukatan data model 0.8 is included as the attached SQL schema file.
The only changes since version 0.7 are the added attributes of the entity relation. The nextversion of the Yukatan data model will add detailed information related to handling of file attachments.