Delpher newspapers
What will you find in our Delpher newspapers dataset? And how do you use the dataset? Delpher newspapers features 2 million digitised newspapers, dating from 1618 until 1995. Some of the files can be freely accessed and downloaded.
What is in Delpher newspapers?
Are you interested to know which countries featured in the news in the 18th century, and which did not? Or which products were advertised in Dutch newspapers during World War II? Delpher newspapers is a great resource for research into historical newspapers. But this dataset is also useful for broader-based history projects. It gives a peek behind the scenes of what was going on in society.
The collection comprises almost 2 million newspapers dating from 1618 until 1995. They are editions from every year of publication of the main Dutch national newspapers, such as De Telegraaf, De Volkskrant and Het Parool, supplemented with a selection of regional and colonial newspapers. The dataset consists of scans of the printed pages, with OCR and word coordinates. There is a searchable PDF for every newspaper, and descriptive and structural metadata are available. Newspapers are added on a regular basis.
You can also search all of the newspapers in this collection via Delpher.
How is the information presented?
The following files are available for every newspaper edition:
- descriptive metadata (Dubline Core in XML)
- structural metadata (MPEG21-DIDL)
- document (PDF)
The following files are available for every page that has been scanned:
- the image (JPEG 2000)
- the text (OCR in XML)
- the coordinates of every word on a page (ALTO)
Conditions for re-use
The data in Delpher newspapers is partially accessible to all. The KB wants to make as much information as possible freely available to all, but this is not possible for newspapers that are still protected by copyright.
Depending on the copyright, the use of this dataset can be divided into two regimes. Newspapers that were first published more than 140 years ago belong to the public domain. They are no longer subject to copyright. Some of the more recent newspapers are still protected by copyright, but are available on request for academics, researchers, lecturers or journalists if they want to use them for research purposes.
We can provide data in several ways:
- There are two APIs: a metadata harvest API on the basis of OAI-PMH, and a search API on the basis of SRU. Manuals for these APIs can be supplied once legal access has been granted via @email. Please note: users must have some experience of programming.
- The Delpher newspapers comprise the texts (OCR, ALTO, XML) from all newspapers dating from 1618 to 1879. The archive is 111 GB and split into 23 ZIP files.
We can sometimes provide customised support. Ask your question via @email.
Contact and feedback
We are interested to know who uses the newspapers, and how they are used. So please send an e-mail with your contact details and a brief explanation of what you intend to do with the data to @email. Feedback is always welcome. If you provide us with your personal details, we will keep you informed about any relevant developments, such as changes to the dataset or the release of new datasets.