General questions about the KB web collection

What is the KB web collection?

The KB web collection (formerly: web archive) is one of the KB's digital collections. It is not the web archive of the Netherlands, but contains a selection of Dutch websites that are periodically archived.

Why does the KB maintain a web collection?

Websites fall under the category of digital publications and thus belong to our area of collection. Almost all (large) national libraries archive websites from their national domain and do so in the public interest or out of a legal obligation. The KB archives websites in the public interest. Due to the continuously changing nature of websites, a lack of regular archiving runs the risk of losing valuable digital information that would otherwise not be preserved anywhere else. The web has become an indispensable, inseparable part of our society. In the first six months (September 2007-March 2008), we selected and archived about 1.000 websites, of which 25% were no longer online in 2016.

Does the KB web collection comply with the Archives Act [Archiefwet]?

The KB web collection is a library collection that is not required to comply with the Archives Act.

When was the web collection started?

The first websites were officially added to the Web Archive in September 2007.

How big is the web collection?

The selection includes about 25.130 websites with a total size of about 130 terabytes (31 March 2026).

Are there any other web archives and collections in the Netherlands?

There are a number of web archives with local, geographically defined collections, such as the Groninger archives. Examples of thematic web archives include Archipol (websites of political parties) and Beeld & Geluid (websites of public broadcasters). The web archives and collections of a number of institutions are included in the Nationaal Register Webarchieven [National Register of Web Archives].

Where can I find more information about the KB web collection?

All information about our web collection is compiled on the web collection page and the pages under it.

Selection for the web collection

How is a website selected?

The selection criteria are articulated in the KB's general collection policy: everything from and about the Netherlands. As regards the web collection, it is not yet technically, legally or financially possible to archive everything. The KB's collections specialists select relevant websites in their field of expertise: Dutch language, culture and history. To some extent, we take into account other Dutch web archives by, for example, not including websites of public broadcasters. The collection specialists try to maintain a good balance in the selection.

Are only Dutch websites included in the collection?

The web collection contains mainly websites with the .nl domain extension. These are Dutch websites, regardless of their language. The web collection also contains Dutch websites with regional (.eu, .frl) and general domain extensions (.com.)

Why doesn't the KB archive the entire Dutch web domain?

The Dutch .nl web domain totalled over 6 million registered domains in March 2026, but not every address has a website behind it. It’s not yet technically, legally or financially possible for us to archive everything. There are mainly legal aspects involved with web archiving: we let each website owner know that we are archiving their website. In addition, we are not legally obliged to maintain a Dutch web archive.

Which websites are included in the selection?

We publish an overview of the web collection several times a year.

Which websites will not be included?

If a website is difficult or impossible to archive because of the technology used (e.g. Flash or complex javascript), inclusion does not make sense. Websites containing illegal content will not be included. To some extent, we take into account other Dutch web archives and collections by, for example, not including websites of public broadcasters.

Does the collection have any particular focal points?

Websites with historical topics, museums and government websites are currently strongly represented in the collection. The collection specialists also compile special web collections, for example regarding the 2014 to 2018 commemoration of World War I. In 2020, we started selecting Dutch websites in response to the Covid-19 pandemic as part of an international web collection. In consultation with Tresoar, the KB archives several hundred Frisian websites.

How often do you archive websites?

Archiving usually takes place on an annual basis. During selection, the collection specialist determines whether a greater frequency is desired.

Which website is archived most often?

We archive the home page and the first underlying pages of nu.nl on a daily basis.

Which website was archived first?

Tilburg University's 'Thomas Instituut te Utrecht' website was officially the first to be archived on 20 September 2007. This website was still online in the same format in 2026.

Which website is the largest?

Can I submit a website for inclusion?

Please send your request to @email and one of our collection specialists will assess the website. We will notify the website owner if we decide to include it.

How many websites in the web collection no longer exist online?

It is not yet possible to answer this for the entire web collection. In the first six months (September 2007-March 2008), we selected and archived about 1.000 websites, of which 25% were no longer online in 2016. In March 2026, at least 5.700 of the 25.000 websites are no longer online.

Access to the web collection

Where can I find the web collection?

Pass holders can access the web collection via the public terminals in the KB reading rooms. Availability via the KB website is not possible for the time being due to legal restrictions.

How long does it take for a selected website to first appear in the web collection?

After selection, there is a four-week waiting period before the website is archived. The website will be accessible to KB members within six months via a dedicated interface on the public terminals in the KB reading rooms.

Why are parts of archived websites missing?

We aim to archive complete websites but, unfortunately, not all the technology used can be archived. Functionalities that require contact with the original server, such as forms and filters, are missing. Components behind a login procedure are also left out.

For special web collections, the collection specialists can select sections or even individual pages of a website.

Can you remove data from or about me from the web collection?

We start from the basis that data on public websites is actually public. It goes against the principle of a web collection to delete data. If you would like to submit an objection to this, please send a reasoned deletion request to @email. See also the information regarding the General Data Protection Regulation (GDPR).

For website owners

May I mention that my website has been included in the KB web collection?

Yes, you may. For example, you can state: ”This website is archived by the KB, the National Library of the Netherlands.” Optionally, you can add the KB word mark.

Do you request permission for inclusion in the web collection?

For collecting websites, we inform website owners of our intention in advance. We do this with a so-called opt-out notice that states our intention to include a specific website in the web collection in relation to the importance of permanently preserving websites. The addressee then has four weeks to refuse consent. In the opt-out notice, we invite you to contact us if more information is desired. See also Legal aspects in web archiving.

What is the most common reason for refusal?

The website owner does not see the point of archiving for them or their organisation. We then let it be known that we archive not only for the owner, but also for the researchers who use the web collection as a source of information for websites that are no longer available online.

Why do you send a standard message?

Unfortunately, given the size of the selection, it is not possible to send a personal letter to all website owners.

Do I have to pay for inclusion in the web collection?

All costs are borne by the KB.

Can I make agreements regarding my website?

We are always willing to work with website owners to see how the website can best be placed in the Web Archive. However, the KB's web collection is not a service for the benefit of individual website owners.

Can I have archived versions of my own website?

No rights can be derived from inclusion in the KB web collection. It is not a service for the benefit of individual website owners. We can never guarantee an archiving will be 100% successful. If a website owner wants to guarantee that their website will be archived according to their own wishes, they will have to take care of this themselves.

Will my visitors notice anything during archiving?

The impact of the technical process of including websites in the web collection is minimal for visitors to your website. The archiving software visits the (server of the) website once a year and runs through all the pages. The software is set up so that a so-called request from our side (request to send page or file) is 5 times slower than the time it took the web server to send a page to us. So if the website sends a page in 0,5 seconds, the archiving software will wait 2,5 seconds before requesting a new page. This limits the load on the web server. We have now been archiving websites for nearly twenty years without receiving any complaints regarding overloading, while the KB's visit can be traced through the log files.

What archiving software does the KB use?

Archiving is done using the Heritrix software. This software stores a website's files exactly as they were sent to the site visitor. They are merged into a .WARC file for permanent storage. To display the archived sites, we use the "Wayback machine" application, which can display the archived websites as they looked at the time of archiving. This software was developed by the Internet Archive.

Do you respect the robots.txt?

In a robots.txt file, the website owner can define restrictions, for example for visiting indexing services like Google. Experience shows that many organisations have a robots.txt on their site without having made a deliberate choice of how to fill it in. The robots.txt is often automatically installed by the content management system and hinders the proper archiving of the site by blocking archiving of design (css) and images. That is why we choose to ignore robots.txt by default. However, the archiving of individual websites can be adjusted. For example, when archiving a website, we can set whether to respect the robots-txt file at the request of the website owner. We will then do a test to determine the impact of this.

Do you archive past a login procedure?

If a website contains personal data, it will probably only be accessible to authorised visitors through a login procedure. Our harvester can only download public pages, i.e. that can be found by the harvester as well as being downloadable. So privacy-sensitive information will not be archived this way.

Are shopping carts and fill-in forms still usable within the web collection?

An archive version is a self-contained unit and no longer has contact with the original server, which means there is a lack of interactivity and it is not possible to use shopping carts and fill-in forms.

Can I see the visit by the archiving software in my log files?

The crawler's user-agent contains our domain name 'www.kb.nl'.