![]() |
| VERS STORY | STANDARD | ASSESSMENT | PROJECTS | DIGITAL ARCHIVE | TRAINING | TOOLKIT | PUBLICATIONS | ||
|
1.0 Introduction 1.1 The principle of self sufficiency 1.2 Requirements for a Long Term Electronic Record Format There are three basic principles that need to be adopted in using a long term format for the purposes of archiving electronic records. These are:
1.1 The principle of self sufficiency To minimise the probability of losing a record, it is necessary to minimise the dependency of the record on systems, other data, or documentation. The ideal record is self sufficient. The rationale behind this principle is simple. Dependency increases the points of failure. If access to a record is dependent on a system, for example, then the loss of that system means the loss of the record. To cite an example, a ‘large’ collection of records requires some means to allow users to find the records they are interested in. One method is to provide an index. But what happens if this index is lost? The records may still exist, but if it is not possible to recreate the index the contents may be inaccessible. Self sufficiency requires that a record must include a copy of its indexing information. If the external index is destroyed it should be able to be rebuilt using the information stored in the records themselves. No record can be completely self sufficient because that would require the storage of all the supporting documentation for the long term record with the record itself. However, if a specification or standard is sufficiently widely published that a copy can reasonably be expected to be found in a public library (or the electronic equivalent) for the foreseeable future, it is sufficient to reference the specification or standard in the record. As a corollary, only widely published specifications or standards should be used within a record as, otherwise, the standard or specification would need to be included in each record. In summary, a good design for an archivable record is record centric. It minimises the dependency of the record on systems, outside data, and documentation. Very well known information can be included by reference to reduce overhead.
1.2 Requirements for a Long Term Electronic Record Format Long term preservation of information can be viewed as a transmission protocol. The sending computer is the system that creates and (perhaps) initially stores the information. The receiving computer is the, as yet unbuilt, system in the future that will display the record. A transmission protocol cannot work unless both the sending and receiving hosts exactly agree on the encoding of the information that passes between them. This can be difficult enough to achieve when the two systems can be tested against each other, but in the case of archiving of electronic information the ‘receiving’ system has not yet been constructed when the record is ‘transmitted’ by the ‘sending’ system. Thus a well designed long term record format has three highly desirable characteristics:
The requirement for a simple, self describing, and self documenting encoding suggests a textual encoding. However, there are two problems with a pure textual encoding of a record. The first problem is efficiency. For example, binary encoding of a 24 bit RGB image requires 3 octets for each pixel. A simple textual encoding would require a minimum of 6 octets (e.g. "0,0,0;") and a maximum of 12 octets (e.g. "255,255,255;"), or between 200% and 400% space overhead for the RGB data. In addition to the space overhead, both parsing and generating the textual encoding is normally more expensive than the parsing and generating the equivalent binary encoding. It is often preferable to use a binary encoding for simple efficiency. The second problem is complexity. Many types of data are inherently complex. Describing a printed page, for example, requires describing the position of every character on the page together with the characteristics of the character such as weight, orientation, and skew. It would be possible to develop a textual encoding to describe a page, but this requires specialist knowledge to ensure that the textual encoding is suitable. It is far preferable to use existing standards for complex data, even if they use a complex binary encoding. It is possible to include binary encodings within an archived object. The key is to:
In summary, a good design for long term electronic record format will be based on a simple textual encoding that ‘marks up’ the data to indicate its extent, syntactic meaning, semantic meaning, and relationship to other data in the record. The use of binary encodings for specific elements in the record is acceptable when this allows the use of specialist standards, provided the use of these standards is well documented within the record.
In many applications, the archived information is useless, or loses value, unless it can demonstrate who created the information, when it was created, and that it has not been subsequently altered. Where this integrity requirement is imposed, it significantly complicates the design of an archived record. This Standard advocates the use of digital signatures to demonstrate the integrity of the record (see 2.2.4 and appendices two and three). A digital signature is a cryptographic technique used to generate a unique signature that depends on the entity signing the object and contents of the object. Up to two signatures can be used to protect the VERS Encapsulated Object from forgery. A record is normally signed separately by the creator of the record and by the system itself. These two signatures protect the record from forgery by any one party acting alone. The creator’s signature ensures that a forgery cannot be perpetrated by a system administrator or by a third party. The system’s signature ensures that the creator cannot forge the record after the event. This process requires two keys, one of which must be kept private, and one which is publicly available. A mathematical algorithm is used to ensure that the data is authentic or secure to a very high degree. The greater the key length, the higher the security of the system. A key is simply a very long prime number. Common key lengths are 40 bits, 128 bits, and 1024 bits. A private key must be kept secret and be held by only one user. The public key is published so as to be accessible to all users of the security system. Digital signatures have the advantage that the record carries its own integrity check and consequently the integrity of the system that holds the records has less relevance. | |||||
![]() |
![]() |
|