KB Web Archive: FAQ

General questions about the KB Web Archive

What is the KB Web Archive?

The KB Web Archive (i.e. web collection) is one of the KB's digital collections. It is not the web archive of the Netherlands, but contains a selection of Dutch websites that are periodically archived.

Why does the KB maintain a web archive?

Websites fall under the category of digital publications and thus belong to our area of collection. Almost all (large) national libraries archive websites from their national domain and do so in the public interest or out of a legal obligation. The KB archives websites in the public interest. Due to the continuously changing nature of websites, a lack of regular archiving runs the risk of losing valuable digital information that would otherwise not be preserved anywhere else. The web has become an indispensable, inseparable part of our society. In the first six months (September 2007-March 2008), we selected and archived about 1.000 websites, of which 25% were no longer online in 2016.

Does the KB Web Archive comply with the Archives Act [Archiefwet]?

The KB Web Archive is a library collection that is not required to comply with the Archives Act.

When was the Web Archive started?

The first websites were officially added to the Web Archive in September 2007.

How big is the Web Archive?

The selection includes about 23.600 websites with a total size of about 102 terabytes (31 December 2023).

Are there any other web archives in the Netherlands?

There are a number of web archives with local, geographically defined collections, such as the Groninger archives. Examples of thematic web archives include Archipol (websites of political parties) and Beeld & Geluid (websites of public broadcasters). The web archives of a number of institutions are included in the Nationaal Register Webarchieven [National Register of Web Archives].

Where can I find more information about the KB Web Archive?

All information about our web archive is compiled on the Web Archiving page and the pages under it.

Selection for the Web Archive

How is a website selected?

The selection criteria are articulated in the KB's general collection policy: everything from and about the Netherlands. As regards the Web Archive, it is not yet technically, legally or financially possible to archive everything. The KB's collections specialists select relevant websites in their field of expertise: Dutch language, culture and history. To some extent, we take into account other Dutch web archives by, for example, not including websites of political parties or public broadcasters. The collection specialists try to maintain a good balance in the selection.

Are only Dutch websites included in the collection?

The web archive contains mainly websites with the .nl domain extension. These are Dutch websites, regardless of their language.

Why doesn't the KB archive the entire Dutch web domain?

The Dutch .nl web domain totalled over 6,3 million registered domains in May 2023, but not every address has a website behind it. It’s not yet technically, legally or financially possible for us to archive everything. There are mainly legal aspects involved with web archiving: we let each website owner know that we are archiving their website. In addition, we are not legally obliged to maintain a Dutch web archive.

Does the selection only include .nl websites?

The Web Archive also includes Dutch websites with regional (.eu, .frl) and general domain extensions (.com).

Which websites are included in the selection?

We publish an overview of the Web Archive several times a year.

Which websites will not be included?

If a website is difficult or impossible to archive because of the technology used (e.g. Flash or complex javascript), inclusion does not make sense. Websites containing illegal content will not be included. To some extent, we take into account other Dutch web archives by, for example, not including websites of political parties or public broadcasters.

Does the collection have any particular focal points?

Websites with historical topics, museums and government websites are currently strongly represented in the collection. The collection specialists also compile special web collections, for example regarding the 2013 royal succession. In 2020, we started selecting Dutch websites in response to the Covid-19 pandemic as part of an international web collection. In consultation with Tresoar, the KB archives several hundred Frisian websites.

How often do you archive websites?

Archiving usually takes place on an annual basis. During selection, the collection specialist determines whether a greater frequency is desired.

Which website is archived most often?

We archive the home page and the first underlying pages of nu.nl on a daily basis.

Which website was archived first?

Tilburg University's 'Thomas Instituut te Utrecht' website was officially the first to be archived on 20 September 2007. This website was still online in the same format in 2022.

Which website is the largest?

At over 190 gigabytes, the 'Internetvoorkeuren van de Opleiding Nederlandse taal en cultuur, en van de Opleiding Nederlandkunde, Universiteit Leiden' website is currently the largest in the Web Archive.

Can I submit a website for inclusion?

Please send your request to @email and one of our collection specialists will assess the website. We will notify the website owner if we decide to include it.

How many websites in the Web Archive no longer exist online?

It is not yet possible to answer this for the entire Web Archive. In the first six months (September 2007-March 2008), we selected and archived about 1.000 websites, of which 25% were no longer online in 2016.

Access to the Web Archive

Where can I find the Web Archive?

Pass holders can access the Web Archive via the public terminals in the KB reading rooms. Availability via the KB website is not possible for the time being due to legal restrictions.

How long does it take for a selected website to first appear in the Web Archive?

After selection, there is a four-week waiting period before the website is archived. The website will be accessible within a year via a dedicated interface on the public terminals in the KB reading rooms.

Why are parts of archived websites missing?

We aim to archive complete websites but, unfortunately, not all the technology used can be archived. Functionalities that require contact with the original server, such as forms and filters, are missing. Components behind a login procedure are also left out.

For special web collections, the collection specialists can select sections or even individual pages of a website.

Can you remove data from or about me from the Web Archive?

We start from the basis that data on public websites is actually public. It goes against the principle of a web archive to delete data. If you would like to submit an objection to this, please send a reasoned deletion request to @email. See also the information regarding the General Data Protection Regulation (GDPR).

For website owners

May I mention that my website has been included in the KB Web Archive?

Yes, you may. For example, you can state: ”This website is archived by the KB, the National Library of the Netherlands.” Optionally, you can add the KB word mark.

Do you request permission for inclusion in the Web Archive?

For collecting websites, we inform website owners of our intention in advance. We do this with a so-called opt-out notice that states our intention to include a specific website in the Web Archive in relation to the importance of permanently preserving websites. The addressee then has four weeks to refuse consent. In the opt-out notice, we invite you to contact us if more information is desired. See also Legal aspects in web archiving.

What is the most common reason for refusal?

The website owner does not see the point of archiving for them or their organisation. We then let it be known that we archive not only for the owner, but also for the researchers who use the Web Archive as a source of information for websites that are no longer available online.

Why do you send a standard message?

Unfortunately, given the size of the selection, it is not possible to send a personal letter to all website owners.

Do I have to pay for inclusion in the Web Archive?

All costs are borne by the KB.

Can I make agreements regarding my website?

We are always willing to work with website owners to see how the website can best be placed in the Web Archive. However, the KB's Web Archive is not a service for the benefit of individual website owners.

Can I have archived versions of my own website?

No rights can be derived from inclusion in the KB Web Archive. It is not a service for the benefit of individual website owners. We can never guarantee an archiving will be 100% successful. If a website owner wants to guarantee that their website will be archived according to their own wishes, they will have to take care of this themselves.

Will my visitors notice anything during archiving?

The impact of the technical process of including websites in the Web Archive is minimal for visitors to your website. The archiving software visits the (server of the) website once a year and runs through all the pages. The software is set up so that a so-called request from our side (request to send page or file) is 5 times slower than the time it took the web server to send a page to us. So if the website sends a page in 0,5 seconds, the archiving software will wait 2,5 seconds before requesting a new page. This limits the load on the web server. We have now been archiving websites for over a decade without receiving any complaints regarding overloading, while the KB's visit can be traced through the log files.

What archiving software does the KB use?

Archiving is done using the Heritrix software. This software stores a website's files exactly as they were sent to the site visitor. They are merged into a .WARC file for permanent storage. To display the archived sites, we use the "Wayback machine" application, which can display the archived websites as they looked at the time of archiving. This software was developed by the Internet Archive.

Do you respect the robots.txt?

In a robots.txt file, the website owner can define restrictions, for example for visiting indexing services like Google. Experience shows that many organisations have a robots.txt on their site without having made a deliberate choice of how to fill it in. The robots.txt is often automatically installed by the content management system and hinders the proper archiving of the site by blocking archiving of design (css) and images. That is why we choose to ignore robots.txt by default. However, the archiving of individual websites can be adjusted. For example, when archiving a website, we can set whether to respect the robots-txt file at the request of the website owner. We will then do a test to determine the impact of this.

Do you archive past a login procedure?

If a website contains personal data, it will probably only be accessible to authorised visitors through a login procedure. Our harvester can only download public pages, i.e. that can be found by the harvester as well as being downloadable. So privacy-sensitive information will not be archived this way.

Are shopping carts and fill-in forms still usable within the Web Archive?

An archive version is a self-contained unit and no longer has contact with the original server, which means there is a lack of interactivity and it is not possible to use shopping carts and fill-in forms.

Can I see the visit by the archiving software in my log files?

The crawler's user-agent contains our domain name 'www.kb.nl'.