Statement and next steps

Statement and next steps

Friday, 20 October 2006, The Hague, The Netherlands Organised by the National Library of the Netherlands and Nationaal Archief of the Netherlands

Introduction

A digital document, record, or artwork exists only in encoded form, which must be “rendered” so that humans can perceive it. In practice this rendering must be performed by software that understands the encoded format of the artifact in question. Yet such software quickly becomes obsolete and unusable as the formats that they render become obsolete. Even if the software is saved, the hardware platforms on which it originally ran quickly become obsolete as well. It is therefore challenging to preserve digital artifacts in a way that will enable future usage.

Several strategies have been proposed for the preservation of digital artifacts (or at least their core contents), including “migration” (i.e., conversion) into new formats, translation into long-lived standard formats or formal representations, and preservation of obsolete hardware and software in computer museums in order to render obsolete artifacts (see references 1, 2, 3, 4 for a detailed discussion of such strategies). However, most such strategies share a serious flaw: they convert the original digital artifact into some other format, often repeatedly over time, thereby introducing a risk of loosing information (although the computer museum approach is an exception to this, it is highly impractical in the long run). In contrast, traditional preservation of literature, records, and artwork attempts to preserve artifacts in their original forms, often at great cost, in recognition of the fact that future generations may have a broad and often unanticipated need to return to these originals for scholarly, legal, or aesthetic purposes. In addition, unlike traditional artifacts, many digital artifacts possess complex interactive and executable behavior that can be retained only by preserving them in their original, executable forms.

One of the approaches that addresses the issue of preserving digital artifacts in their original forms is the use of software to emulate the hardware platforms on which these artifacts' rendering software originally ran. This approach [5, 6] uses the well-understood computer science technique of software emulation of hardware to extend the life of an artifact's original hardware platform indefinitely, by recreating it virtually in software that can then be saved and run on future computers. In order to preserve a digital artifact using this approach, the artifact itself, its original rendering software, and the operating system and software environment required by that software are all saved. To be able to run the preserved software and view the original artifact on future computers a software emulator is used that virtually turns the future computer into the obsolete computer. Because all of the elements of this strategy that must be saved consist of software, they can be preserved indefinitely by copying their bit streams to new digital storage media without any loss of quality. Note that although each distinct obsolete computer system would require its own emulator program, each such emulator would need to be written only once, which would require far less effort than migrating and converting every individual digital artifact repeatedly throughout the ages.

In order to ensure that emulators for obsolete computers can be run on a wide range of future computers, these emulators should be specified or coded in some long-lived form. One attractive method of doing this is to write such emulator programs to run on a virtual machine platform that is designed to run emulators on future computers. Such an “Emulation Virtual Machine” or EVM [7, 8] would ideally be designed to be easy to implement on any computer, thereby ensuring that all emulators could be run on any future machine.

Statement

Emulation is a viable preservation strategy that has a number of unique advantages:

  • It preserves and permits access to each digital artifact in its original form and format; it may be the only viable approach to preserving digital artifacts that have significant executable and/or interactive behavior.
  • It can preserve digital artifacts of any form or format by saving the original software environments that were used to render those artifacts. A single emulator can preserve artifacts in a vast range of arbitrary formats without the need to understand those formats, and it can preserve huge corpuses without ever requiring conversion or any other processing of individual artifacts.
  • It enables the future generation of surrogate versions of digital artifacts directly from their original forms, thereby avoiding the cumulative corruption that would result from generating each such future surrogate from the previous one.
  • If all emulators are written to run on a stable, thoroughly-specified "emulation virtual machine" (EVM) platform and that virtual machine can be implemented on any future computer, then all emulators can be run indefinitely.

Next Steps

In order to develop a practical, off-the-shelf preservation strategy based on emulation, a number of additional steps are required, including:

  • Create and demonstrate example emulators suitable for long-term preservation.
  • Develop fidelity criteria for each behavioral dimension of digital artifacts (e.g., display, sound, timing) and develop validation test suites that evaluate these criteria and verify that the logical behavior of an emulator matches that of its target computer.
  • Research and develop device-independent input/output mechanisms to allow unmodified programs to behave and interact appropriately with users on future computer platforms.
  • Develop methods for capturing and preserving contextual information describing the logical, physical, organizational, and social environments in which digital artifacts were originally used, as well as documentation describing how they were used and what they were used for.
  • Develop methods for describing, managing, and automatically interpreting information about the versions and configurations of software and hardware needed to render digital artifacts under emulation.
  • Define and develop a long-lived emulation environment to enable emulators to be run indefinitely. This environment could be equivalent to an emulation virtual machine (EVM) platform, though it may be implemented as a long-lived programming language along with a stable set of program library facilities. This environment should:

- Enable using old digital artifacts by running their original software under emulation on unforeseen future computers;
 - Provide automatic configuration of emulators, software environments, and applications to render old digital artifacts;
- Provide documentation, active user help, and/or automatic reinterpretation of old interaction modes into future equivalents, to help future users utilize old digital artifacts under emulation;
- Provide mechanisms to facilitate (or, ideally, automate) the future generation of surrogate versions of digital artifacts directly from their original forms.

  • Develop network-based services for providing remote access to old digital objects via emulation, without requiring remote users to load and run an emulation environment on their local systems.

References

  1. http://www.panix.com/~jeffr/Prof/digilong.html
  2. http://besser.tsoa.nyu.edu/howard/longevity
  3. http://www.leeds.ac.uk/cedars/DigPres.htm
  4. Michelson, A., and J. Rothenberg, "Scholarly Communication and Information Technology: Exploring the Impact of Changes in the Research Process on Archives", The American Archivist, 55:2, 1992, pp. 236-315, ISSN 0360-9081AA
  5. Rothenberg, J., "Ensuring the Longevity of Digital Documents", Scientific American, January 1995, Vol. 272, Number 1, pp. 42-7)
  6. Rothenberg, J., Using Emulation to Preserve Digital Documents, Koninklijke Bibliotheek, July 2000,  ISBN 906259145-0
  7. Lorie, R., "Long-Term Preservation of Digital Information", Proceedings of the First Joint Conference on Digital Libraries, ACM/IEEE, June 2001

More information on the EEM 2006 can be requested by Jeffrey van der Hoeven.