Automatic metadata generation at the KB
What exactly is automatic metadata generation? And why does it matter? All the books, newspapers and magazines that come into the KB will appear in the KB catalogue. For each work, we write down the name of the author, date of publication, publisher, format, and so on. Automation is meant to optimise the KB's cataloguing processes and make them future-proof.
What is automatic metadata generation?
In automatic metadata generation, we use Artificial Intelligence (AI). Using such technologies enables us to automatically assign metadata to new works in our collection. Cataloguing is currently still done manually by a specialised team of cataloguers, but a simple title description can consist of 20 or 30 different fields. With complex titles, there are even more.
The KB has been working on several projects over the years to explore the possibilities of automatic metadata generation with AI. One of these is Demosaurus. It assigned links to thesauri for authors and keywords using the Finnish tool Annif. More recently, the KB ran a pilot to help in the cataloguing of donated physical books from the retro collection: the Retrotool.
The Retrotool
Based on photos of the title page and colophon, the Retrotool automatically generates many of the required fields. This saves cataloguers time and effort, as they only need to check that the fields are correct and make additions where necessary.
This tool:
- Works with a document camera, OCR (Optical Character Recognition) and an LLM (Large Language Model) to extract information from photographs of the title page and colophon. That means, for example, details of the title, author and publisher.
- Looks directly into our catalogue to identify duplicates quickly and effectively.
- Can produce a basic title description with minimal effort from the cataloguer, which the cataloguer then checks and finalises.
- Does not act independently. The final responsibility always lies with the cataloguers. They ensure that the correct description goes into the catalogue.
Who are we developing automatic metadata generation for?
Automatic metadata generation helps the KB cataloguers. Currently, they are still assigning metadata by hand. However, automatic metadata generation will support them and make describing titles less labour-intensive. The work of cataloguers ultimately benefits our readers and researchers: good title descriptions ensure that they find the works they are looking for. Moreover, the time savings from this kind of tooling keeps the KB's collection as up-to-date as possible.
Artificial intelligence and the KB
Artificial intelligence is developing rapidly and becoming increasingly important in the library world and in the humanities. Besides the many possibilities it offers, AI also brings challenges. Naturally, any AI solution we develop must handle our data carefully. The KB is closely monitoring developments and examining the responsible use of AI.
Automatic metadata generation and the KB mission
Through automatic metadata generation, we are aiming to make our collections more searchable and findable for readers and researchers. This is how we are contributing towards a smarter, more creative and skilled Netherlands.
Who does the KB collaborate with?
The Retrotool was developed by an external, AI-specialist party on behalf of the KB. This involved working closely with our cataloguers to tailor the tool to their needs and requirements. The descriptions are also required to comply with the KB's quality standards.
What are the future plans?
The Retrotool pilot was successfully completed in the autumn of 2024. The tool will be actively used to process donations in the near future. We are also building on the experience gained with Demosaurus and the Retrotool to prepare for a structural, widely deployable solution in the course of 2025. For more information about the Retrotool and the follow-up process, please contact Marie Buesink, innovation coordinator, at @email