Web Archiving at the Library After 25 Years
This post is co-written by Abbie Grotke (Section Head), Amanda Lehman (Digital Collections Specialist), and Melissa Wertheimer (Senior Digital Collections Specialist) of the Web Archiving Section to close our 25th-anniversary celebration year with some highlights of the program.
We Have a History
In 2025 we wrapped up our 25th year of web archiving at the Library of Congress! We also ended the year with over 100 web archive collections available online. The program has matured even since we celebrated our 20th anniversary in 2020. We can illustrate our history through the very objects we’ve helped to preserve and make available to you: our patrons, researchers, and colleagues.
The Web Archiving Program started as a small pilot program to archive the 2000 Presidential election. The following year, the September 11th attacks happened. Suddenly, there was a lot of material online about the terrorist attacks. The Library and some external partners quickly ramped up web archiving efforts to try to preserve as much as it could; the fruits of those efforts are available in the September 11, 2001 Web Archive.
Web archiving was still so new back then with a lot to learn, but the experience laid the groundwork to move beyond the pilot phase into the more established, expanded program we are today. After our successful pilot phase, we turned into a Web Capture Team as information technology staff, then a Web Archiving Team staffed by librarians, and in 2024 a formal section within the Digital Collections Management & Services Division. While these organizational movements within the institution didn’t necessarily change the scope of our work, we’re now anchored amongst other digital collections and digital preservation staff.
The staff of our program and our collaborators are what make the program a success. Our Section Head supervises four Senior Digital Collection Specialists and specialists who all manage and support the Web Archiving Program. We each bring special skillsets to the program we run, including project management, coding, archival theory, mentorship, communication, metadata, and general digital collections expertise. And yet, we can’t do it all alone! Staff around the Library contribute to the program in numerous ways, from technical tasks, to data scholarship, to information technology infrastructure support, policy development, and over one hundred subject and language specialists who lead collections and select content for them. Junior Fellows, Librarians in Residence, and Innovators in Residence have also been key participants in our work to preserve and make the ever-changing web available.

We Provide Access for Public and Research Use
Why is our collaborative task so large? Well, the LC Web Archive is huge! The Library’s web archives have grown considerably since our pilot phase and continue to grow every year: in December 2005, the web archive was 38,976GB (or 38.976TB) of data; now, we’ve reached over 5.7PB of data (yes, petabytes). The data includes not just the captures (or, the copies) of the websites, but also indexes and files about them. This data represents online content published in 137 languages and 218 countries. These languages and countries span 203 total web archive collections, 83 of which are considered “active” collections because they’re still growing.
The web archive collections fall into one of two main branches of collecting:
Like many of our peer web archiving institutions around the world, we provide access to web archives with specialized display tools. Our program’s current and ongoing focus is the process to migrate our web archives presentation tools for the best user experience possible. Researchers can also explore data collections and connect with experts to summarize or provide derivative datasets.

We Value our Web Archiving Community
The Library of Congress was a founding member of the International Internet Preservation Consortium (IIPC) in 2003. The IIPC member organizations are from over 35 countries, including national, university and regional libraries and archives. IIPC is a web archiving community network to share practices and expertise amongst cultural heritage institutions and the technologists who support them. Library of Congress staff have served as leaders of the consortium from its inception by elected participation to the Steering Committee, service on conference program committees, and co-chairing working groups that develop The IIPC conferences have brought other LC staff into the fold to present on how they support the program, as well. You can explore posters, papers, slide decks, and videos of IIPC presentations through the University of North Texas (UNT) Digital Library.
We especially love to share our expertise and what we learn from colleagues about web archiving with the general public and specialized groups through presentations, talks, classrooms of graduate students in library and information science programs, and during special events like the National Book Festival and Preservation Week.
Learn More
If you’re interested in web archives stewarded by IIPC member institutions, a good place to start is the IIPC Members map for links to organizations doing domain crawling and a variety of event and thematic web archives. Members also contribute to a variety of collaborative collections as a part of the Content Development Working Group.
What could the next 25 years hold? Only time will tell! What we do know is that archives are meant to be used, and that’s the inherent beauty of web archiving – access is built into the process. Most importantly, it’s people who make it all happen, including the tools to do it.
Source of Article