Victorian Electronic Records Strategy - Forever Digital logo
 


Search
    

6.2 Re-implementing rendering software

There are often several possible file formats that will preserve the desired characteristics. In order to select amongst these alternative formats, PROV prefers the format that is easiest to re-implement.

The worst case preservation scenario is where it becomes necessary to re-implement, from scratch, software to render a record. In order to re-implement rendering software, the future developer will require:

  • the ability to identify the format. This information is contained in metadata within the VEO. (Specifically these elements are M128 File Encoding and M131 Rendering Text. Further information about these elements is contained in PROS 99/007 Specification 2: VERS Metadata Scheme and its associated Advice.)
  • the ability to obtain a copy of the specification of the format. Normally, we would expect this specification to be in the form of a published document available from a library (or archive).

6.2.1 Simple formats

The ideal format is one that is simple enough to allow it to be described it in a short piece of text that can be included in the M128 File Encoding element within the VEO. Very few formats are this simple.

6.2.2 Published formats

For records that are too complex to describe in a short piece of text, the preferred format is one that has been formally specified and published. The VEO must include a reference to the published specification in either the M128 File Encoding or M131 Rendering Text elements. An archive can build up a library of the specifications that it uses, or can rely on accessing the specifications through legal deposit libraries.

Almost all records are sufficiently complex to require an external published format. Consider a record that contains just text in several languages. In order to render the text it is necessary to convert the bit stream in the file into a sequence of character numbers. It is then necessary to map each character number into a glyph (the character image displayed on the screen). The Unicode standard [Unicode] describes thousands of character glyphs and has several mechanisms for converting the character numbers into bit streams. It is clear that this complex specification could not be summarised in the M128 File Encoding element. Instead, this element will contain a reference to the Unicode standard.

Most electronic records are a great deal more complex than a simple text file. For example, consider a document such as this Advice. Although most of the content is text, the characters have formatting applied (e.g. colour, font, and weight). The characters are combined to form higher level formatting units (e.g. paragraphs) which have their own formatting applied. Some paragraphs have particular characteristics (e.g. they are numbered, form indexes, or a headers). Finally, the document contains objects that are not textual, but images. These images may have their own specifications (e.g. JPEG, TIFF, GIF). The result is that any specification rich enough to cater for all the features of current electronic documents is very complex, and will often contain references to other specifications.

Where there are a choice of several suitable formats, the following criteria are used to select between them:

  • Widely used formats. Many formats are developed but are not widely adopted. Widely adopted formats are preferred for long-term preservation as it is far more likely that software will continue to be available to render the format.
  • Non proprietary formats. Non proprietary formats (e.g. international standards) are preferred over proprietary formats. One advantage of non proprietary formats is that the specification is independent of a particular vendor. It is much harder, then, for vendors to add additional features that use undocumented extensions to the format.
  • Independent implementations. Formats that have several independently written implementations are preferred over formats where there is only one implementation. Independent implementations help ensure that vendors accurately implement the specification. It should be noted that significant ‘re-badging’ occurs in the computer industry and this can make it difficult to determine how many independent implementations there actually are. In a given situation it may appear that there are several independent implementations, but, in fact, all the implementations may use the same code licensed from the original developer.

back to top

Department for Victorian Communities logo - Link to DVC home Public Record Office Victoria logo - Link to PROV home