Koninklijke Bibliotheek (KB) carries out extensive research to ensure that the millions of electronic publications stored in its e-Depot remain accessible in the long term. For instance, projects have been started to study emulation and migration as permanent access strategies. As information on file format characteristics is vital for this type of research, KB has begun research into file formats as well. File formats describe the manner in which information in a computer file is encoded. PDF or MS Word for publications, TIFF or JPEG for images, MPEG or AVI for video, AutoCAD’s DWG for technical drawings and MS Access or MySQL for databases are only a few examples out of numerous format types.
In order to make sound decisions on which strategy for permanent access to follow or which file formats to prefer, comprehensive knowledge on file formats is required. Is it best to save publications as PDF, TIFF, MS Word or perhaps another file format? And if PDF is the format of choice, which version is best in the long term? Do certain options, such as the possibility to secure files with a password, have an effect on long-term preservation or can they be used without risks? It is questions like these that KB wants to find answers to.
Based on structured tests, research into software specifications and results from fellow institutions, KB intends to publish recommendations on the long-term preservation and access of various file formats. Furthermore, research will be carried out into the accessibility of formats, alternative viewers will be analysed and a risk analysis will be conducted. The overall goal of KB’s file format research is to safeguard the long-term preservation and access of the files in the e-Depot.
Results
The project will have the following results:
- Documentation on the technical preservation metadata about the files in the e-Depot (the information that is required to interpret a file in the future);
- Documentation on file formats, per format and per version, in which the effects of certain file characteristics on long-term preservation are described;
- A risk analysis for every type of format in the e-Depot, with advice on the steps that need to be undertaken (i.e. migration, emulation or any other strategy) to ensure permanent access;
- Guidelines for suppliers of the e-Depot, with specifications on the options and properties that should or should not be used; See Publications & links# Guidelines.
- If necessary, research will be conducted into various writers and viewers for certain file formats;
- Finally, a selection of tools that extract preservation metadata from files will be studied, such as JHOVE and DROID. These tools can extract information on file size, compression methods and the fonts that are used. Based on the outcome, a decision will be made on whether to implement one of these in the workflow of the e-Depot.
All documentation, recommendations and guidelines will be published on this website
Contact
The project was carried out from October 2005 to December 2006.