As well as sustainably preserving our collections, we want to ensure that we provide our customers optimal access to them. This involves improving the quality of digital content (in ways a computer can interpret it), creating metadata, enriching content and distributing it.
How do we improve the quality of our digital content?
With the help of optical character and layout recognition (OCR and OLR), computers are capable of reading scans of newspaper articles. We are now focusing on deep-learning techniques to improve our OCR, enabling computers to be trained to recognise and correct their own mistakes.
How do we create (semi-)automatic metadata for smarter and richer content descriptions?
Thanks to techniques from artificial intelligence, such as language technology and image recognition, our computers are increasingly capable of interpreting texts. They are now able to identify characters and genres in a publication and add this information to the publication's metadata. This data has huge potential because in the future, it will enable us to more effectively help our customers to find content in which they are interested. Currently, computers still recognise and interpret texts based on instructions from humans. However, the next step will involve computers doing this themselves and start to give recommendations. In addition, we can use image recognition to identify people, objects and subjects in images. However, poor quality scans make this more difficult because of low resolution.
How can we provide smarter access to our content with the help of citizen scientists?
Major advances have been made by using members of the public (citizen scientists) to train computers. But with the sheer size of the data we have, we need to use and develop even smarter methods: computers must be given a nudge to enable them to take on human tasks. Recognition of handwriting, for example, is something that we still have people do. We have large numbers of good quality digital manuscripts and numerous experts, including palaeography specialists of medieval manuscripts. How can we deploy experts and the public to enable even smarter access to these kinds of sources?
How can we enrich our collections, connect them to each other and present them together to our customers?
New developments of the semantic web, linked data and knowledge databases like Wikipedia offer the opportunity to identify and enhance entities (e.g. people, locations and buildings), enabling the user to find all the information he or she is looking for as quickly as possible. This can be done by adding additional stages of information to the users’ query and thus channelling the user's question. By connecting the data from various collections, we aim to offer our user an answer that is as complete as possible.
Which platforms can be used to distribute digital content?
Which media and platforms will be used to communicate in twenty years' time? What developments can we see on the horizon and how can we anticipate on these in order to offer our content on the consumer platforms of the future?
Find out more
- Meijers, E., de Valk, S. & Helmus, W. (2018): A distributed network of digital heritage information
- Van Veen, T., Lonij, J. & Faber, W.J. (2016): Linking Named Entities in Dutch Historical Newspapers
- Hooland, S., Verborgh, R. (2015): Linked Data for Libraries, Archives and Museums: How to clean, link and publish your metadata