Delpher newspapers

What will you find in our Delpher newspapers dataset? And how do you use the dataset? Delpher newspapers features 2 million digitised newspapers, dating from 1618 until 1995. Some of the files can be freely accessed and downloaded.

What is in Delpher newspapers?

Are you interested to know which countries featured in the news in the 18th century, and which did not? Or which products were advertised in Dutch newspapers during World War II? Delpher newspapers is a great resource for research into historical newspapers. But this dataset is also useful for broader-based history projects. It gives a peek behind the scenes of what was going on in society.

The collection comprises almost 2 million newspapers dating from 1618 until 1995. They are editions from every year of publication of the main Dutch national newspapers, such as De Telegraaf, De Volkskrant and Het Parool, supplemented with a selection of regional and colonial newspapers. The dataset consists of scans of the printed pages, with OCR and word coordinates. There is a searchable PDF for every newspaper, and descriptive and structural metadata are available. Newspapers are added on a regular basis.

You can also search all of the newspapers in this collection via Delpher.

How is the information presented?

The following files are available for every newspaper edition:

The following files are available for every page that has been scanned:

  • the image (JPEG 2000)
  • the text (OCR in XML)
  • the coordinates of every word on a page (ALTO)

Conditions for re-use

The KB strives to make as much information as possible freely available to all. This is not always possible because newspapers might still be copyrighted or contain (sensitive) personal information.

Depending on the copyright, the use of this newspaper collection can be divided into two regimes. Newspapers that were first published more than 140 years ago belong to the public domain. They are no longer subject to copyright. More recent newspapers are often still protected by copyright and privacy law. Use of these newspapers for non-commercial scientific research may be possible in some cases.

We can provide data in several ways:

  1. There are two APIs: a metadata harvest API on the basis of OAI-PMH, and a search API on the basis of SRU. Manuals for these APIs can be supplied once legal access has been granted via @email. Please note: users must have some experience of programming.
  2. The Delpher newspapers comprise the texts (OCR, ALTO, XML) from all newspapers dating from 1618 to 1879. The archive is 111 GB and split into 23 ZIP files. These newspapers are free of copyright and may be used without restrictions.

We can sometimes provide customised support. Ask your question via @email.

Contact and feedback

We are interested to know who uses the newspapers, and how they are used. So please send an e-mail with your contact details and a brief explanation of what you intend to do with the data to @email. Feedback is always welcome. If you provide us with your personal details, we will keep you informed about any relevant developments, such as changes to the dataset or the release of new datasets.