There are two basic strategies for web archiving. The first strategy focuses on the automatic harvesting of websites in large quantities (usually a national domain). The second strategy makes selections based on a specific selection policy. Automatic harvesting is relatively cheap compared with the selective approach, which is necessarily more labour-intensive. On the other hand, in harvesting a limited number of sites more attention can be paid to technical details and websites can be archived down to the deepest level.
The KB has decided on a selective approach. This means that a selection of Dutch websites will be archived. There are a number of reasons why the KB has chosen this approach. The Netherlands has no legal deposit legislation that compels publishers to provide the KB with copies of their products. Because of this, websites can only be crawled after permission has been granted by the owner, making it impossible to crawl the complete Dutch domain. In addition, the selective approach is more manageable in conjunction with the legal approach that has been chosen. The Dutch domain is also extremely large and would be expensive to archive in its entirety. Finally, bulk archiving is not suitable for the complete archiving of websites. Bulk archiving has to do with the making of a snapshot, in which strict limits are placed on the number of files to be crawled and the amount of data. Since the basic motive for web archiving is permanent storage, it does not seem wise to preserve only a limited portion of the websites. After all, we don’t store only the title pages of books.
For the time being, the KB will base its selection of websites to be archived on the KB’s own collection policy. Within this framework a well-reasoned selection will be made consisting of a cross section from the Dutch web domain. It should be noted that the Dutch web domain is a broad concept that is by no means limited to the .nl domain; it contains all the websites registered in the Netherlands. The primary selection will be taken from websites with academic and cultural content, although innovative websites that are examples of current trends in the Dutch portion of the web will also be considered. The next step will be to seek collaboration with other knowledge institutes for the purpose of broadening the selection, thereby making use of the substantive expertise of these organisations. Another idea is to give websites owners themselves the opportunity to volunteer their sites for archiving.