The Koninklijke Bibliotheek/National Library of the Netherlands (KB) has developed a specific workflow for archiving electronic publications. Elements of this workflow are: accept and pre-process; generate and resolve identifiers; search and retrieve publications; and identify, authenticate and authorise users. The technical heart of the e-Depot system is IBM’s DIAS (Digital Information and Archiving System).

Work flow

Processing and storing the digital content is called loading. Publications are either sent to the KB on tape, DVD and CD-ROM (for processing back files) or by means of FTP (current material). In both cases, publications ready for ingest end up in an electronic post office in which they are checked and validated for compliance with previously agreed technical specifications. In the case of any errors, the content is passed to a database for error recovery (BER). Inspection of this database is currently the only manual effort involved in the process. If the content passes the test, it is combined with the metadata into so-called Publisher Submission Packages (PSP’s); these PSP’s are then processed by an application within DIAS called the Batch Builder. Other DIAS applications include the Content Manager and the Tivoli Storage Manager.

The Batch Builder ingests the material (both the content and the metadata), and converts the bibliographical descriptions supplied by the publisher into the KB’s internal metadata format, while adding a unique identifier, the so-called National Bibliographical Number (NBN). After conversion, the content itself is stored in the e-Depot, while the metadata are stored in the KB’s bibliographical database.

Automatic loading

Fully automatic loading is sometimes carried out for large quantities of electronic publications, whereby hundreds of thousands or millions of articles are loaded during batch processes. This type of loading is only possible when the publisher supplies extensive and well-specified metadata along with the publications. These metadata are subsequently converted into the KB preferred format (Dublin Core in XML, extended by a number of fields that facilitate hierarchical browsing). By using the publisher’s metadata, an important labour-intensive task is by-passed by the KB.

The DIAS solution

The DIAS solution provides a flexible and scalable open deposit library solution for storing and retrieving massive amounts of electronic documents and multimedia files. It conforms to the ISO Reference Open Archival Information System (OAIS) standard and supports physical and logical digital preservation. The DIAS solution allows the manual as well as automated ingest of digital information (assets) into the system. Once the asset is successfully stored it will be maintained and preserved. Preservation functionality will be enhanced in future DIAS versions to generate signals when stored assets must be converted or migrated to ensure their availability.

Facts & Figures