Victorian Electronic Records Strategy - Forever Digital logo
 


Search
    

4.2 Encoding Metadata

An Encoding is a physical representation of a Document; it is equivalent to a file on a computer. A Document may have several Encodings; for example a report may be represented as a Word file, a PDF file, and as an RTF file. Many types of Documents have many ways of being represented. For example, a colour picture could be saved in (at least) the following formats: Photoshop PSD, Amiga IFF, BMP, Photoshop EPS, FlashPix, JPEG, PCX, Photoshop PDF, PICT, Pixar, PNG, Raw, Scitex, Targa, and TIFF (not all of these would be suitable long-term preservation formats).

Figure 22. Subelements of the Encoding Metadata element.

The subelements of Encoding are:

  • File Encoding (M128). This subelement contains a textual description of the process of generating the data in the Encoding. Typically, this would be a description of the file format used to represent the image and any subsequent encoding of the format (e.g. conversion to Base64). This subelement complements the Document Source (M125) subelement. The contents of Document Source (M125) describe how the document was created up to the point where it was saved. The File Encoding (M128) continues this description by describing the format in which the document was saved.

    Standard values for this subelement for the specified long-term preservation formats are given in PROS 99/007 Specification 4: VERS Long Term Preservation Formats.

  • Source File Identifier (M129). This subelement contains the name of the computer file that contained the document data when the record was created. The last element of this name (i.e. the filename without any file extension or folder path which might be present) can be used to name the contents of the document data when it is extracted.
  • File Rendering (M130). This subelement is used to describe how to render (display) the encoding. It has two subelements: Rendering Text (M131) is intended to be used by people, and Rendering Keywords (M132) by software.
  • Rendering Text (M131). This subelement is the inverse of File Encoding (M128). It contains a textual description of the process that must be used to extract the information in the Encoding and display (render) it. In practice, this normally just reverses the steps described in the File Encoding element, so it is often sufficient to include in the Rendering Text element an instruction to read the contents of the File Encoding element.
  • Rendering Keywords (M132). This subelement is a machine-processible equivalent of the Rendering Text element. It contains instructions that can be used by a program when extracting and displaying the Encoding. This allows the Encoding to be automatically extracted and displayed.


  • The contents of the Rendering Keywords subelement are a list of file types. The extracting program uses each file type to identify a suitable program to open a file of that type and to turn it into a file of the following type. The specification allows files to be described using the standard three-letter extensions used in Word systems (e.g. 'pdf' for PDF files), or MIME types.

    For example the value 'b64 pdf' instructs the extracting program that the encoding is represented in the VEO as a 'b64' (Base64) file. The program identifies an application that can open (decode) 'b64' files. The resulting file is a 'pdf' (PDF) file, and a second application is identified to open files of this type.

    The ability to use the contents of the Rendering Keyword subelement is obviously dependent on retaining the link between a file format and the application able to process it. This link is not expected to survive for long periods, but while it does survive the functionality is useful.

    Over the longer term, the contents of this subelement can be used to identify all of the Documents in a particular Encoding. This can be used, for example, in identifying Encodings that need to be migrated.

An example of a minimal set of Encoding Metadata follows:

<vers:EncodingMetadata>
   <vers:FileEncoding>
    <vers:Text>

	The content of the DocumentData element is a PDF file. The file conforms to
	'PDF Reference', third edition, Adobe Portable Document Format, Version 1.4,
	Adobe Systems Incorporated, Addison Wesley, 2001, ISBN 0-201-75839-3
	(http://partners.adobe.com/asn/developer/acrosdk/docs/filefmtspecs/PDFReference.pdf
	visited 7 January 2003) as modified in the 'Errata for PDF Reference, third
	edition' (http://partners.adobe.com/asn/developer/acrosdk/docs/PDF14errata.txt
	visited 7 January 2003). It may contain digital signatures defined by PDF
	Public-key Digital Signature and Encryption Specification, Version 3.2, Jim
	Pravetz, 12 September 2001, Adobe Systems Incorporated
	(http://partners.adobe.com/asn/developer/pdfs/tn/ppk_pdfspec.pdf visited
	28 March 2003) and the appearance of the digital signature in a PDF document
	is defined in Digital Signature Appearances for Public-Key Interoperability,
	Adobe Systems Incorporated, September 2001
	(http://partners.adobe.com/asn/developer/pdfs/tn/PPKAppearances.pdf visited
	28 March 2003). The file has been encoded using Base64 which is defined in
	IETF RFC 2045 "Multipurpose Internet Mail Extensions (MIME) Part One:
	Format of Internet Message Bodies", Section 6.8
	"Base64 Content-Transfer-Encoding".

     </vers:Text>
    </vers:FileEncoding>
    <vers:SourceFileIdentifier>
      P:\Presentations\PublicAccountsCtee\VERSIntegrity.pdf
    </vers:SourceFileIdentifier>
    <vers:FileRendering>
     <vers:RenderingText>
      <vers:Text>See the vers:FileEncoding element</vers:Text>
     </vers:RenderingText>
     <vers:RenderingKeywords>
      .b64; .pdf
     </vers:RenderingKeywords>
    </vers:FileRendering>
</vers:EncodingMetadata>

back to top

Victorian Government logo - Link to VicGov home Public Record Office Victoria logo - Link to PROV home