The Spanish Web Archive is the collection of websites (including blogs, forums, documents, images, videos, etc.) that are collected in order to preserve the Spanish documentary heritage on the Internet and ensure access to it.

Due to the sheer size of the Internet and the technological means currently available, it is now impossible to aspire to exhaustiveness in web archiving. Therefore, in order to try to save as much information as possible, the National Library of Spain has opted for a mixed model that combines massive and selective harvesting, as other national libraries around the world do.

These collections can be consulted both at the National Library of Spain and at the Conservation Centers of the different Autonomous Communities

History of the collection

Inspired by the UNESCO’s Guidelines for the Preservation of Digital Heritage (2003) and the Commission Recommendation of 24 August 2006 on the digitisation and online accessibility of cultural material and digital preservation, the National Library of Spain began to capture Spanish websites hosted in the .es domain, as well as in other generic domains and subdomains (.com, .edu, .gob, .org, .net, etc.).

From the start of the project in 2009 until the end of 2013, eight broad crawls of the .es domain and two selective harvests were carried out. The first selective harvest was focused on monographic coverage of the General Elections of 20 November 2011 and the second undertook the compilation of Spanish resources in the field of the Humanities. The result of these crawls, made by Internet Archive for the National Library of Spain, was transferred to the Library’s servers at the end of 2014, through a cooperation agreement signed with Red.es. Red.es collaborates actively with the Library in technological and infrastructure development for the management of the legal deposit of online publications.

In 2014 the Library installed in a test environment the open source NetarchiveSuite suite of tools for web crawling and archiving. With this proprietary system the Library has since conducted several selective crawling on events relevant to Spanish history and culture, such as the death of Adolfo Suárez, the abdication of Juan Carlos I, the proclamation of Felipe VI, the European elections in 2014, the Local and Regional 2015 and General Elections of 2015‑2016.

In 2015 Royal Decree 635/2015, of 10th july, which regulates the legal deposit online publications was published, which entered into force on 26 October of that year. This royal decree supports the activity on the preservation of online publications that conservation centres have carried out in recent years, particularly in terms of web archiving projects.

In 2016 the first domain .es broad crawl was carried out with its own resources, which lasted 3 months.

During that year, in addition, cooperation was strengthened between the conservation centres of the Autonomous Communities and the National Library of Spain to manage and build a collaborative online legal deposit. An increasing number of centres are managing their own web collections, using the tools that BNE has made available to all of them.

What are the web archives?

A web archive is the set of resources collected from the Web over time.

These resources form collections of websites grouped by subject, a topic, an event or risk of disappearance. The harvest is done in an automated way by crawlers that scan the websites, copying and saving all the information. This information is stored, preserved and disseminated through the Spanish Web Archive.

The collections seek to reproduce in detail the aspect of the website and the functionalities available during the harvest, so that the replica of the website is as navigable as its “ live ” version. Once the crawl is complete, the archived websites are displayed at the OpenWayback, an application that enables the user to select and consult a specific version of a particular web.

The websites are previously selected by library staff specialised in the digital heritage preservation. The selection criteria are defined in the Collection Development Policy document.

All the information is stored in a standard file format called WARC (acronym for Web ARChive file format, ISO 28500), which compresses all the information about the collected websites.

What are the web archives for?

The same way as any other bibliographical material, the Library archives websites for a variety of reasons that justify their necessity and usefulness for future generations:

Content not stored in a web archive will disappear in perpetuity and irretrievably.
They bear witness to the history of the internet and the creation of websites.
Study of society and the evolution of customs and ideas.
Preservation of a country’s online cultural and documentary heritage.
Storage of ephemeral content with great potential to disappear in the short term.
Tool for the study and research of events with high representation on the Internet.
Recovery of the content of deleted or missing websites.

Collection strategy

Due to the enormous size of the Internet and technological means currently available, it is now impossible to aspire to completeness in web archiving. Therefore, to try to save as much information as possible, the National Library of Spain has opted for a mixed model which combines full domain and selective harvesting. This model is consistent with other international web collection policies, as other national libraries around the world do.

Web archiving tools

The tool used by the National Library of Spain for web archiving is called NAS (NetArchiveSuite). This open source application was designed in 2004 by the Royal Library of Denmark and is now also used for this purpose by other national libraries. For crawling it uses the Heritrix robot, created by the Internet Archive, which was the first organization to crawl and archive the Web since 1996. For viewing the archive, it uses OpenWayback, an application created by the International Internet Preservation Consortium (IIPC), which offers the user the possibility to consult a website captured on a given date.

General selection criteria

The general selection criteria are based on article 3 of the Royal Decree 635/2015, of 10 july, regulating the legal deposit of online publications, according to which websites subject to legal deposit are those that:

contain bibliographic, sound, visual, audiovisual or digital heritage of cultures of spain;
are under the .es domain and associated subdomains, as well as other domains in the national territory;
are hosted in other domains (.com, .net, .org, .edu, etc.), but contain spanish documentary heritage;
are in any of the official languages of the State;
are in any format, including the publications contained therein;
have both free and restricted access.

Collectable online publications

There are several categories of web resources that would be advisable to include in website selections in order to make the documentary sample as representative as possible:

Social media: press headlines, news agencies, radio and television stations.
Administrative bodies: Ministries, Autonomous Communities, City Councils.
Political institutions: political parties.
Cultural institutions: museums, archives, libraries, schools, universities, research centres.
Scientific institutions.
Health institutions.
Sports institutions.
Websites focusing on natural and artistic heritage.
Cultural events, congresses, assemblies, conferences…
Websites of private companies.
Associations: professionals, NGOs.
Blogs and websites of relevant people related to the subject of the collection.
Social networks: Twitter (currently X), Facebook.
Wikis: Wikipedia.
Video recordings: YouTube.

Non-collectable online publications

There are certain limitations related to legal and issues affecting the collection of online publications.

On the legal side, according to Royal Decree 635/2015, the following are excluded from collections (art. 4):

Mail and private correspondence.
Content that is hosted only on a private network.
Personal data to which only a restricted group of people has access.

In accordance with the provisions of articles 6 and 7 of Royal Decree 635/2015, of 10 July, the NATIONAL LIBRARY OF SPAIN, O.A., exercises its function of capturing and depositing online publications that have been the object of public communication and websites accessible through communications networks. This capture and deposit is carried out without altering the contents in order to guarantee their integrity and historical traceability. Consequently, the BNE is not responsible for those contents that, being part of the capture and deposit, are contrary to the law, morality or public order, being the owners of such communications responsible for them.

From a technical point of view, some content, despite being freely accessible on the Internet, cannot be collected under current technological conditions:

Databases, repositories, catalogues.
Interactive reading viewers.
Streaming content.
Files in the cloud.
Content behind filters, dorp-down lists or check boxes.

National collaboration

The Consejo de Cooperación Bibliotecaria (CCB) through the Legal Deposit and Digital Heritage Working Group promotes collaboration between the different conservation centers and the National Library of Spain. The Spanish Web Archive has the participation of more than 40 web conservators, who play a fundamental role in the selection of seeds and in quality control of the preserved material. Their work is essential for the creation and maintenance of regional collections, as well as for the related events in which they participate.
Red de Bibliotecas Universitarias Españolas (REBIUN). In 2023, a general Action Protocol was signed between Crue-REBIUN and the BNE to carry out joint activities related to the Online Publications Repository and the Spanish Web Archive. Currently, around 10 web curators collaborate, integrated into the REBIUN Bibliographic Heritage Group and from the CSIC and various Spanish universities. Its work focuses on the selection of seeds related to science and technology issues.
Fundación Sancho el Sabio. Cultural institution focused on collecting, organizing, preserving and disseminating documentation related to Basque culture. Since 2019, he has supported several web curators for the selection and quality control of websites on the Internet related to the Basque Country

International collaboration

The National Library of Spain participates in collaborative collections organized by the IIPC (International Internet Preservation Consortium), on the occasion of events of international interest. These are some examples: