![]() |
| VERS STORY | STANDARD | ASSESSMENT | PROJECTS | DIGITAL ARCHIVE | TRAINING | TOOLKIT | PUBLICATIONS | ||
|
2.2 Structured textual encoding Long-term preservation of information can be viewed as a transmission protocol. The sending computer is the system that creates and (perhaps) initially stores the information. The receiving computer is the future system - as yet unbuilt - that will display the record. A transmission protocol cannot work unless both the sending and receiving hosts precisely agree on the encoding of the information that passes between them. This can be difficult enough to achieve when the two systems can be tested against each other, but in the case of the archiving of electronic information the 'receiving' system has not yet been constructed when the record is 'transmitted' by the 'sending' system. Thus a well-designed long-term record format has three highly desirable characteristics:
The requirement for a simple, self-describing, and self-documenting encoding suggests a textual encoding. However, there are two problems with the pure textual encoding of a record. The first problem is efficiency. For example, binary encoding of a 24 bit RGB image requires 3 octets for each pixel. A simple textual encoding would require a minimum of 6 octets (e.g. "0,0,0;") and a maximum of 12 octets (e.g. "255,255,255;"), or between 200% and 400% space overhead for the RGB data. In addition to the space overhead, both parsing and generating the textual encoding is normally more expensive than parsing and generating the equivalent binary encoding. It is often preferable to use binary encoding for simple efficiency. The second problem is complexity. Many types of data are inherently complex. Describing a printed page, for example, requires describing the position of every character on the page together with the characteristics of the character such as weight, orientation, and skew. It would be possible to develop a textual encoding to describe a page, but this requires specialist knowledge to ensure that the textual encoding is suitable. It is far preferable to use existing standards for complex data, even if they use complex binary encoding. It is possible to include binary encodings within an archived object. The key is to:
In summary, a good design for a long-term electronic record format will be based on a simple textual encoding that 'marks up' the data to indicate its extent, syntactic meaning, semantic meaning, and relationship to other data in the record. The use of binary encodings for specific elements in the record is acceptable when this allows the use of specialist standards, provided the use of these standards is well documented within the record. | |||||
![]() |
![]() |
|