End of Term Web Archive

{{Short description|Project to archive US government websites}}

{{Infobox project

| name = End of Term Web Archive

| abbreviation = EOT Archive

| logo =

| image = North America Geological Tapestry.gif

| caption = A version of this USGS map was archived by project partner UNT in the 2008 End of Term collection.

| alt = A geological map of the North American Continent.

| mission_statement = "The End of Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations."

| commercial = No

| type = Collaborative government web archive

| products =

| location =

| country =

| owner =

| founder =

| primeminister =

| key_people =

| established = 2008

| disestablished =

| funding =

| budget =

| current_status =

| website = {{url|https://eotarchive.org/}}

| screenshot = }}

{{United States presidential transitions series}}

The End of Term Web Archive is an archival project that preserves U.S. federal government websites during administration changes.{{Cite news|last=Dwyer|first=Jim|date=2016-12-02|title=Harvesting Government History, One Web Page at a Time (Published 2016)|language=en-US|work=The New York Times|url=https://www.nytimes.com/2016/12/01/nyregion/harvesting-government-history-one-web-page-at-a-time.html|url-status=live|access-date=2020-12-07|archive-url=https://web.archive.org/web/20200118001901/https://www.nytimes.com/2016/12/01/nyregion/harvesting-government-history-one-web-page-at-a-time.html|archive-date=18 Jan 2020|issn=0362-4331}}

Background

The End of Term Web Archive was set up following a 2008 announcement from National Archives and Records Administration (NARA) that they would not be archiving government websites during transition, after carrying out such crawls in 2000 and 2004.{{Cite journal|last=Webster|first=Peter|editor1-first=Niels|editor1-last=Brügger|date=2017|title=Users, technologies, organisations: Towards a cultural history of world web archiving|url=https://hcommons.org/deposits/item/hc:26187/|journal=Web 25. Histories from 25 Years of the World Wide Web|language=en-US|volume=|pages=179–190|doi=10.3726/b11492|isbn=9781433140655|archive-url=https://web.archive.org/web/20201021215748/https://hcommons.org/deposits/item/hc:26187/|archive-date=2020-10-21|via=|hdl=2318/1770557|hdl-access=free}} The 2004 federal web harvest can be accessed alongside congressional web harvests, beginning with the 109th United States Congress, at [https://webharvest.gov/ National Archives].{{Cite web|last=|first=|date=|title=National Archives|url=https://webharvest.gov/|url-status=live|archive-url=http://webarchive.loc.gov/all/20170918235110/https://www.webharvest.gov/|archive-date=2017-09-18|access-date=2021-01-18|website=Congressional & Federal Government Web Harvests|language=en}}

The first project partners were the Library of Congress, George Washington University Libraries, Stanford University Libraries, University of North Texas Libraries, the US Government Publishing Office, California Digital Library and the Internet Archive, all members of the International Internet Preservation Consortium. The project was initially sketched out after a General Assembly of the IIPC in 2008.{{Cite journal|last1=Seneca|first1=Tracy|last2=Grotke|first2=Abbie|last3=Hartman|first3=Cathy Nelson|last4=Carpenter|first4=Kris|date=2012|title=It Takes a Village to Save the Web: The End of Term Web Archive|url=http://wikis.ala.org/godort/images/7/7d/DttP_40n1.pdf|journal=DTTP: Documents to the People|volume=40|pages=16|issn=0091-2085|archive-url=https://web.archive.org/web/20150908001132/http://wikis.ala.org/godort/images/7/7d/DttP_40n1.pdf|archive-date=2015-09-08|via=}} NARA and the Environmental Data & Governance Initiative ([https://envirodatagov.org/ EDGI]) joined the 2020/21 project.{{Cite web|last=|first=|date=|title=GitHub - end-of-term/eot2020|website=GitHub|url=https://github.com/end-of-term/eot2020|url-status=live|archive-url=https://web.archive.org/web/20201205003120/https://github.com/end-of-term/eot2020|archive-date=2020-12-05|access-date=2020-12-14}}

The project

File:White House.gov 404 error 1-20-09.JPG used to direct whitehouse.gov visitors as the website changed in 2009.]]

The project archives websites and documents for public access and research use.{{Cite web|last=|first=|date=2020-12-06|title=End of Term Web Archive: U.S. Government Websites|url=http://eotarchive.cdlib.org/|url-status=live|archive-url=https://web.archive.org/web/20201206142930/http://eotarchive.cdlib.org/|archive-date=2020-12-06|access-date=2020-12-15}} Data from archiving 2008, 2012, 2016, and 2020 End of Term datasets can be downloaded in bulk.{{Cite web |title=Datasets |url=https://eotarchive.org/data/ |access-date=2025-02-02 |website=End of Term Web Archive |language=en}} As of February 2025, the 2004 datasets are still being inventoried and there is plan to move a copy of all datasets into Amazon Web Services.

A UNT study into the risk to document files found that 83% of PDFs on the .gov domain in 2008 were missing four years later.{{Cite web|last=Gilmore|first=Courtney|date=4 Dec 2020|title=UNT Part of Team Archiving Obama Administration Web Content|url=https://www.nbcdfw.com/news/local/unt-part-of-team-archiving-president-obama-administration-web-content/2034483/|url-status=live|archive-url=https://web.archive.org/web/20201207132642/https://www.nbcdfw.com/news/local/unt-part-of-team-archiving-president-obama-administration-web-content/2034483/|archive-date=7 Dec 2020|access-date=2020-12-04|website=NBC 5 Dallas-Fort Worth|language=en-US}} This is consistent with the requirement to manage websites, but their status means that changes may be of interest to the public and watchdog groups.{{Cite web|title=Website Monitoring|url=https://envirodatagov.org/website-monitoring/|url-status=live|archive-url=https://web.archive.org/web/20201206101408/https://envirodatagov.org/website-monitoring/|archive-date=2020-12-06|access-date=2021-02-24|website=Environmental Data and Governance Initiative|language=en-US}} Evidence of the demand for continued access to historical web material can be found in an announcement made by the EPA in response to concerns about changes in 2017, stating that pages from the previous administration would be carefully archived.{{Cite news|last1=Mooney|first1=Chris|last2=Eilperin|first2=Juliet|title=EPA website removes climate science site from public view after two decades|language=en-US|newspaper=Washington Post|url=https://www.washingtonpost.com/news/energy-environment/wp/2017/04/28/epa-website-removes-climate-science-site-from-public-view-after-two-decades/|url-status=live|access-date=2021-02-18|archive-url=https://web.archive.org/web/20170429160537/https://www.washingtonpost.com/news/energy-environment/wp/2017/04/28/epa-website-removes-climate-science-site-from-public-view-after-two-decades/|archive-date=2017-04-29|issn=0190-8286}} These snapshot pages were clearly marked to distinguish them from contemporary content.{{Cite web|date=2017-04-29|title=Climate Change {{!}} US EPA|url=https://19january2017snapshot.epa.gov/climatechange_.html|access-date=2021-04-08|archive-url=https://web.archive.org/web/20170429202520/https://19january2017snapshot.epa.gov/climatechange_.html|archive-date=2017-04-29}}

The archive prioritizes sites administering areas regarded as likely to be updated or removed over the period of transition.{{Cite web|last=|first=|date=2016-12-05|title=Guerrilla Archiving|url=https://politicsofevidence.wordpress.com/guerrilla-archiving/|url-status=live|archive-url=https://web.archive.org/web/20200804085020/https://politicsofevidence.wordpress.com/guerrilla-archiving/|archive-date=4 Aug 2020|access-date=2020-12-07|website=The Politics of Evidence|language=en}} The public are encouraged to nominate important sites and these are combined with broad crawls of government domains to create the collection.{{Cite web|last=Jacobs|first=James R.|date=2020-08-10|title=Nominations sought for the U.S. Federal Government Domain End of Term 2020 Web Archive|url=https://freegovinfo.info/node/13820/|url-status=live|archive-url=https://web.archive.org/web/20201004150749/https://freegovinfo.info/node/13820|archive-date=4 Oct 2020|access-date=2020-12-07|website=Free Government Information (FGI)|language=en-US}}{{Cite web|last=|first=|date=2020-10-07|title=End of Term Archive on Twitter: "And so it begins. We have officially started crawling the websites nominated for the End of Term 2020 web archive! But don't worry, you still have time to nominate more! What are your favorite government sites? #WebArchiveWednesday #WebArchives #GovDocs"|url=https://twitter.com/eotarchive/status/1313945356623851521|url-status=live|archive-url=https://web.archive.org/web/20201007205728/https://twitter.com/eotarchive/status/1313945356623851521|archive-date=7 Oct 2020|access-date=2020-11-06}} Although it is extensive - the 2016 crawl preserved 11,382 sites - it stops short of being comprehensive.{{Cite news|last=O'Keefe|first=Ed|date=2015-10-08|title=How many .gov sites exist? Thousands. - The Washington Post|newspaper=The Washington Post|url=https://www.washingtonpost.com/blogs/federal-eye/post/how-many-gov-sites-exist-thousands/2011/12/20/gIQAkGAG7O_blog.html|url-status=live|archive-url=https://web.archive.org/web/20151008071818/https://www.washingtonpost.com/blogs/federal-eye/post/how-many-gov-sites-exist-thousands/2011/12/20/gIQAkGAG7O_blog.html|archive-date=8 Oct 2015|access-date=2020-12-04}}{{Cite web|last=Young|first=Lauren J.|date=|title=The Librarians Saving The Internet|url=https://apps.sciencefriday.com/data/librarians.html|url-status=live|archive-url=https://web.archive.org/web/20201109025646/https://apps.sciencefriday.com/data/librarians.html|archive-date=9 Nov 2020|access-date=2020-12-04|website=Science Friday|language=en}} Researchers have used these collections to examine the history of climate change policy and reuse of suspended U.S. government Twitter accounts.{{Cite web|last=EDGI|first=Toly Rinberg, Maya Anjur-Dietrich, Marcy Beck, Andrew Bergman, Justin Derry, Lindsey Dillon, Gretchen Gehrke, Rebecca Lave, Chris Sellers, Nick Shapiro, Anastasia Aizman, Dan Allan, Madelaine Britt, Raymond Cha, Janak Chadha, Morgan Currie, Sara Johns, Abby Klionsky, Stephanie Knutson, Katherine Kulik, Aaron Lemelin, Kevin Nguyen, Eric Nost, Kendra Ouellette, Lindsay Poirier, Sara Rubinow, Justin Schell, Lizz Ultee, Julia Upfal, Tyler Wedrosky, Jacob Wylie|date=|title=Changing the Digital Climate|url=https://100days.envirodatagov.org/changing-digital-climate/|url-status=live|archive-url=http://webarchive.loc.gov/all/20180404212805/https://100days.envirodatagov.org/changing-digital-climate/|archive-date=2018-04-04|access-date=2021-01-14|website=100days.envirodatagov.org|language=en}}{{Cite web|last=Littman|first=Justin|date=2017-11-04|title=Suspended U.S. government Twitter accounts|url=https://gwu-libraries.github.io/sfm-ui/posts/2017-11-04-digital-registry|url-status=live|archive-url=https://web.archive.org/web/20171107030127/https://gwu-libraries.github.io/sfm-ui/posts/2017-11-04-digital-registry|archive-date=2017-11-07|access-date=2020-12-07|website=Social Feed Manager}}

The 2024 crawl began in January 2024, with a [https://digital2.library.unt.edu/nomination/eth2024/about/ URL Nomination Tool] developed by the University of North Texas.{{Cite web |title=Nomination Tool: About Project |url=https://digital2.library.unt.edu/nomination/eth2024/about/ |access-date=2025-02-02 |website=digital2.library.unt.edu}}{{Cite web |last=Cron |first=Bethany |date=2024-06-24 |title=Announcing the 2024 End of Term Web Archive Initiative |url=https://records-express.blogs.archives.gov/2024/06/24/announcing-the-2024-end-of-term-web-archive-initiative/ |access-date=2025-02-02 |website=Records Express |language=en-US}}

See also

References

{{reflist}}