Victorian Electronic Records Strategy - Forever Digital logo
 


Search
    

7.4 JPEG

JPEG files are very widely used to represent continuous tone images (e.g. photographs and greyscale images) due to the excellent compression algorithm. JPEG images are particularly widespread on the internet. JPEG should not be used to compress images with sharp edges as the compression algorithm often results in compression artefacts (known as ‘ringing’). These artefacts take the form of fuzzy echo lines parallel to the sharp edge. A second compression artefact is that the picture is tiled with small rectangles that blur the fine detail of the image. These artefacts result from the compression algorithm. If these artefacts are not acceptable, TIFF or JPEG 2000 should be used.

JPEG is defined by an international standard [ISO10918]. Note that identical standards (except for the identifier) are issued by ISO and the ITU.

Technically, the JPEG standard supports both lossless and lossy compression. With lossless compression, when an image is compressed and then decompressed the resulting image is identical to the original image (i.e. no information is lost). With lossy compression, the uncompressed image will be different from the original image. In practice, almost no use is made of the lossless mode in JPEG and almost all JPEG images use lossy compression. (Note that there is also a JPEG Lossless standard (ISO 14495-1 or ITU-T T.87) which is a completely separate standard that is not supported by VERS.)

The use of lossy compression is a contentious issue for long term preservation of images. Ideally, lossless compression should be used for archival master images. This is because:

  • The image preserved should be the highest quality possible.
  • Future preservation actions on a lossy compressed image may reduce the quality of the image even further. For example, if the JPEG image format became obsolete and it was necessary to migrate the image to a new format, it would be necessary to uncompress the JPEG image and recompress it using the new format. The uncompressed image will have lost information when compared with the original image, and the new compression may well lose additional information. It is even theoretically possible that the interaction between the two lossy compression algorithms could result in extremely poor results.

In practice, it is possible to make too much of this argument for the following reasons:

  • For many images accurate colour rendition is not essential. For example, documents in an archive are often written on coloured stock, or are printed using simple printing processes. Precise colour control is consequently not necessary. With such documents the compression algorithm should be chosen to preserve the detail of the image and to give the feel of the original.
  • JPEG is a well defined standard. While it may become obsolete, it is very unlikely that an archive will be unable to re-implement the decompression software when required. Consequently, it is unlikely that JPEG images will need to be migrated to another format. If it ever was required to migrate to another format, advances in compression techniques should mean that a lossless compression format could be selected to avoid the second loss of quality.

For these reasons, we would strongly recommend for images:

  • where colour fidelity is important, we recommend using a lossless compression (e.g. TIFF or JPEG 2000)
  • that have sharp edges, or it is desired to retain the fine detail, we would recommend using a lossless compression algorithm or JPEG 2000.
  • where the slight loss of colour fidelity is acceptable, we consider that JPEG is an acceptable format. This includes where the paper stock is coloured, the ink is coloured, or printing process is too simple to ensure accurate colour rendition.
  • that are already compressed using JPEG when received, the existing compression should be retained. There is no benefit in uncompressing a JPEG image, and then recompressing using a lossless compression algorithm as the information has already been lost. In fact, re-compressing using a lossless algorithm may simply be hiding the fact that information has been lost by a previous JPEG compression.

The basic JPEG standard defines several profiles. All are accepted by VERS:

  • Baseline. This is a sequential DCT (lossy) process with scans with 1 to 4 components, each of which uses 8 bit samples. It uses Huffman coding with 2 AC and 2 DC tables.
  • Extended. This extends the Baseline profile. It allows sequential or progressive encoding, 8 or 12 bit samples, and either arithmetic or Huffman encoding with 4 AC and 4 DC tables.
  • Lossless. This is a lossless process.
  • Hierarchial. This uses multiple frames, each of which can either use the extended or lossless processes.

The JPEG standard defines an interchange format. This format specifies the bitstream used to encode an image. The ‘abbreviated format for compressed image data’ is not accepted as a long term preservation format by VERS. Images using this abbreviated format do not include some of the tables required to decompress the image (instead, it is assumed that they are built into the decoder). Consequently, this format is not appropriate for long term preservation. In practice, it is unlikely that any normal generated JPEG image would use this mode as it could not be decoded by all viewers.

JFIF (JPEG File Interchange Format) [JFIF] defines a further profile of JPEG. This profile is not included in the international standard, but was defined by Eric Hamilton of C-Cube Microsystems. This profile is both a restriction and extension of the standard interchange format. Many, if not most, JPEG files are actually JFIF files. It strongly recommends use of the Baseline JPEG profile and a YCbCr colour space. It uses an application specific extension to include the version, units, X pixel density, Y pixel density, and a thumbnail. Standard decoders will be able to decompress JFIF files and will ignore the application specific extensions. If it desired to check if a particular file is JFIF encoded, the file should commence with the hex string ‘FF D8 FF E0 xx xx 9A 96 49 00’ (where the ‘xx’ may be any hex number).

There are a number of other encodings of JPEG. These include SPIFF, EXIF, and JTIP. These have not been investigated due to their rarity and are not accepted as long term preservation formats for VERS.

back to top

Department for Victorian Communities logo - Link to DVC home Public Record Office Victoria logo - Link to PROV home