Victorian Electronic Records Strategy - Forever Digital logo
 


Search
    

4.1 Program obsolescence

4.1.1 The challenge

The key challenge to long-term preservation is preserving the ability to render the information contained in a digital file.

This challenge arises because the information in a digital file is simply a sequence of binary data: 'ones' and 'zeros'. Unlike writing, the sequence has no inherent meaning: a sequence could be a report, an image, a musical work, a database, a computer program or anything else. To render the information it is necessary for a program (or application) to interpret the binary data.

Most computer users are already familiar with this problem, as they would have had the experience of being emailed an attachment that they cannot open because they do not have the necessary program installed on their computer. In order to display the attachment it is necessary to:

  • work out what the format is
  • identify an appropriate program
  • either install the program on the computer or get the sender to resend the attachment in a form the receiver can render.

These basic steps are also necessary to render an electronic record. Any solution to the preservation of electronic records encompass allow the identification of the format and the acquisition of an appropriate program to display it. Over time, a major difference with electronic records is that the creator of the record is unlikely to be available, nor will the software that was used to create it. It is consequently not always possible to resave an aged archival record in another format.

The unfortunate problem with obtaining a program to display a particular format is that programs are inherently fragile. They depend for their correct operation on a complicated computer infrastructure. This infrastructure includes the hardware of the computer, the operating system, supporting tools such as compilers or interpreters, software libraries, and even the organisation of computer files in the file system. If any of this infrastructure is changed, a program may cease to function, which, in turn, will make the records rendered by that program inaccessible. Change in infrastructure is inevitable as computer technology develops.

There is also a commercial aspect to the ability to render the information contained in a digital file. It is normally necessary to purchase the programs used to render the information contained in a digital file, and, as the infrastructure changes, to purchase upgrades or new versions of the programs. This is a continual cost to providing access to the digital information. Further, upgrades can only be obtained whilst the vendor continues to support the program. Over 100 years or more, it is reasonable to assume that support for most current programs will cease, either by the vendor going out of business or due to a commercial decision by the vendor to cease supporting the product.

One final issue with program obsolescence is the accuracy of rendering the information. Programs interpret the digital data. If the interpretation changes or is incorrect, the rendering of the information will change. Again, most users will have received an email attachment that is displayed incorrectly because they are using a different program to that used by the sender when creating the attachment. Alternatively, the sender and receiver are running different versions of the same program and the two versions do not render the program in the same way.

4.1.2 VERS approach

The approach taken in VERS to solve the problem of application obsolescence is the conversion of the record content to a long-term preservation format . The long-term preservation format is chosen to minimise (ideally to avoid) the problem of application obsolescence. The value - or otherwise - of this approach depends on the selection of an appropriate preservation format. The long-term preservation formats accepted by PROV are listed in PROS 99/007 Specification 4: VERS Long-term Preservation Formats.

Conversion to a well chosen long-term preservation format reduces or avoids the problem of application obsolescence by allowing the rendering program to be re-implemented from scratch, if necessary, or allowing the record to be subsequently converted to a replacement format. We refer to this approach as a 'data centric' approach; the focus is on the format of the data. It is to be contrasted with an 'application centric' approach, where the focus is on preserving the applications (programs) that access the data.

Ideally, the long-term preservation format allows a rendering program to be re-implemented from scratch in the future. To allow this it is necessary for the data format to be accurately specified.

In choosing appropriate formats, VERS uses the following criteria:

  • Simple format. The ideal preservation formats are those that are sufficiently simple that it is possible to include a complete specification of the format with each record. Such a description would have to be short: no more than a hundred words or so. Very few formats are this simple, but an example might be a scientific data file which consists of a table of integers.
  • Published formats. The more common situation occurs where a data format is defined by one or more published specifications. There are many such formats. The simplest example is a plain text file where the format is defined by a specification such as Unicode (ISO 10646) which defines the character glyphs, character numbers, and the encoding of the character number in the data file. More complex examples of published specifications include the standard image formats such as GIF, TIFF and JPEG. Some published specifications are very complex, including page description formats such as Adobe's PDF.

Some of these formats are formal de jure Standards published by standards bodies (e.g. JPEG, ISO 10646), while others are de facto standards (e.g. GIF, TIFF, and PDF), which may be proprietary formats. The important feature of all of these formats is that the specification is published, is available, and will be continue to be available for the indefinite future. A conservative archive should, of course, obtain reference copies of the specifications for the data formats it accepts.

Formal de jure standards are preferred as long-term preservation formats, however, because it is more likely that vendors will implement them accurately. The problem with proprietary formats, particularly those where only one or two implementations exist, is that the vendor that owns the format may 'cheat' and either not implement the format accurately, or add additional undocumented features.

There are often several suitable published formats which may be chosen as a long-term preservation format. In this case, consideration should be given to what characteristics of the record it is important to preserve over a long period of time. For example, PROV has judged that a key characteristic of record it was necessary to preserve was the appearance of the record as the original creator saw it. This led to the selection of PDF as a long-term preservation format over an XML format, as PDF can ensure a far more accurate representation.

Where there is no suitable published format, VERS recommends choosing a widely used industry standard format. Perhaps the best example of such a format is Microsoft Word in the word processing arena.

When adopting an industry standard format, a different strategy for long-term preservation must be used. The strategy is to ultimately convert the records from the industry standard format. The organisation holding the records must monitor the availability of software that can render or convert the format, and when the format is becoming obsolete undertake the conversion.

The advantage of adopting an industry standard format is that an archive can harness economics to its benefit. A very widely used industry standard format is unlikely to become obsolete rapidly. Any new program that competes with the industry leader has to convert the data formats used by the industry leader; otherwise the new competitor will be unable to enter the market. Finally, there are likely to be several options for conversion, allowing an archive to minimise cost and maximise the accuracy of the conversion.

Published formats are preferred to unpublished industry standard formats, for two reasons:

  • There may be only a short window of opportunity for conversion before an obsolete format becomes unreadable. An archive must monitor the obsolescence of the formats and fund conversion before this window closes.
  • The conversion is dependent on externally sourced products and may not be sufficiently accurate for archival purposes.

Conversion to a long-term preservation format is a conversion process, similar to digitising or microfilming paper records, and an agency or archive must ensure accuracy of conversion. For example, there are many methods of converting to PDF, but some of them can produce inaccurate representations of the record. The mechanisms used in microfilming or digitising (e.g. statistical sampling of the conversion process) can be used in ensuring accuracy of digital conversion.

The timing of the conversion has a bearing on the accuracy of conversion. Where the record is converted sometime after it is created, the conversion accuracy may be limited. This may occur, for example, if the conversion program is upgraded and this changes the results of the conversion. In selecting a conversion process, it is worth considering whether the process is used for day-to-day business activities, as this vastly improves the conversion accuracy. For example, PROV has found that the most accurate conversion tool for PDF is Adobe's Distiller. A major reason for this is that the basis for this tool is the use of the standard printing functions in the application producing the PDF. Since printing is a business-critical function, the distillation has a very sound basis for conversion.

back to top

Victorian Government logo - Link to VicGov home Public Record Office Victoria logo - Link to PROV home