‘Content’ is understood as all the original materials and the digital files that are extracted from them. The creation of content is the basis of every digitization project. Its specifications depend on the aims of the project. This could be the on line presentation of image and/or text, but also the preservation of fragile library- and archive materials.

Content that has been created – depending on the source materials (image or text) - may consist of:

  • one master file;
  • one or more derivative files;
  • machine-readable text files of the page;
  • metadata (descriptive, structural and technical).

Image files

A distinction is made between two types of image files: master files and derivative files. The master files are the basis for all subsequent manipulations. Frequently used file formats for masters are: TIFF 6.0, TIFF LZW, JPEG quality 10, JPEG2000, PNG. Derivative files are required for presentation on the internet and – in case of text files – as ‘intermediate’ for improving OCR-results. JPEG and searchable PDF are the most commonly used formats for derivatives.

The quality of the master files is determined by the degree to which they are true representations of the originals. To evaluate the quality different specifications like e.g. bit depth, resolution, file format and compression are elementary. Quality managers systematically check the images and – together with the supplier that produces the master files – provide input for the optimal fine-tuning of hard- and software.

The Koninklijke Bibliotheek did research on the file formats in relation to the storage of master files (see  report pdf). As a consequence of the ever growing required storage capacity it has been decided to choose JPEG2000 as the preferred file format for master files.

Metadata

The descriptive and structural metadata serve the search-and-retrieval functionality on the website. Descriptive metadata contain bibliographical data like author, title or date of publication. Structural metadata provide information on the structure of the file like: numbering, pages, paragraphs, indexes and table of contents. They also record the interrelationship between the materials, like a chapter in a book or an image from a specific document. Technical metadata describe the technical characteristics of the master files, like data on the scanner used, the resolution, bit depth, colour and source of illumination. For storing metadata a number of standard formats are used.