Archive.today

{{Selfref|For a guide to using archive.today within Wikipedia, see Help:Using archive.today.}}

{{Infobox website

| name = archive.today

| logo = Archive.today logo with subtitle.svg

| logo_caption =

| screenshot = File:Archive.today Screenshot - 12.19.2024.png

| screenshot_size = 300px

| caption = Screenshot of the archive.today home page

| url = {{Plainlist|

{{URL|https://archive.today}}
{{URL|https://archive.fo}}
{{URL|https://archive.is}}
{{URL|https://archive.li}}
{{URL|https://archive.md}}
{{URL|https://archive.ph}}
{{URL|https://archive.vn}}
{{Onion URL|archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd}}{{Cite tweet|user=archiveis|number=1189322374598053890|title=a current list of all tor domains and clear net domains|date=30 October 2019}}

}}

| type = Web archiving

| language = Multilingual

| registration = No

| launch_date = {{Start date and age|2012|5|16}}{{Cite web|url=https://blog.archive.today/post/77015559437/when-did-the-archive-is-site-originally-launch|title=When did the Archive-is site originally launch?|author1=Archive.is blog|website=Tumblr|date=18 February 2014|access-date=10 April 2021|archive-date=20 March 2021|archive-url=https://archive.today/20210320075425/https://blog.archive.today/post/77015559437/when-did-the-archive-is-site-originally-launch|url-status=live}}

}}

archive.today (formerly archive.is) is a web archiving website founded in 2012 that saves snapshots on demand, and has support for JavaScript-heavy sites such as Google Maps and Twitter.{{cite web|last1=Brinkmann|first1=Martin|date=22 April 2015|title=Create publicly available web page archives with Archive.is|url=https://www.ghacks.net/2015/04/22/create-publicly-available-web-page-archives-with-archive-is/|url-status=live|archive-url=https://web.archive.org/web/20190412072055/https://www.ghacks.net/2015/04/22/create-publicly-available-web-page-archives-with-archive-is/|archive-date=12 April 2019|access-date=13 June 2015|website=Ghacks}} archive.today records two snapshots: one replicates the original webpage including any functional live links; the other is a screenshot of the page.{{cite journal|last1=Brunelle|first1=Justin F.|last2=Kelly|first2=Mat|last3=Weigle|first3=Michele C.|last4=Nelson|first4=Michael L.|date=25 January 2015|title=The impact of JavaScript on archivability|url=https://www.cs.odu.edu/~mweigle/papers/brunelle-ijdl16.pdf|url-status=live|journal=International Journal on Digital Libraries|volume=17|issue=2|pages=95–117|doi=10.1007/s00799-015-0140-8|s2cid=8433375|archive-url=https://web.archive.org/web/20190527064810/https://www.cs.odu.edu/~mweigle/papers/brunelle-ijdl16.pdf|archive-date=27 May 2019}}

History

Archive.today was founded in 2012. The site originally branded itself as archive.today, but changed the primary mirror to archive.is in May 2015.{{cite web|url=https://blog.archive.is/post/118010496181/why-did-you-change-the-url-back-from-archive-today|title=Why did you change the URL back from archive-today to archive-is?|work=Archive.is Blog|date=3 May 2015|archive-url=https://web.archive.org/web/20150601001607/http://blog.archive.is/post/118010496181/why-did-you-change-the-url-back-from-archive-today|archive-date=1 June 2015|url-status=live|access-date=6 January 2019}} It began to deprecate the archive.is domain in favor of other mirrors in January 2019.{{cite tweet |user=archiveis |number=1081276424781287427 |title=Please do not use archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon.|date=4 January 2019|archive-url=https://web.archive.org/web/20190106000101/https://twitter.com/archiveis/status/1081276424781287427|archive-date=6 January 2019|url-status=live}}

In 2021, archive.today had saved about 500 million pages.{{Cite web |last=Patokallio |first=Jani |date=5 August 2023 |title=archive.today: On the trail of the mysterious guerrilla archivist of the Internet |url=https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/ |url-status=live |archive-url=https://web.archive.org/web/20230813201525/https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/ |archive-date=13 August 2023 |access-date=1 January 2024 |website=Gyrovague |language=en}}

Features

Archive.today can capture individual pages in response to explicit user requests.{{cite web|last1=Dascalescu|first1=Dan|author-link=Dan Dascalescu|url=https://wiki.dandascalescu.com/reviews/online_services/web_page_archiving|title=Web page archiving – Dan Dascalescu's Wiki (review)|publisher=Wiki.dandascalescu.com|date=18 February 2013|access-date=3 October 2013|archive-url=https://web.archive.org/web/20130922192354/http://wiki.dandascalescu.com/reviews/online_services/web_page_archiving|archive-date=22 September 2013|url-status=dead}}{{cite web|last1=Koebler|first1=Jason|url=https://www.vice.com/en/article/dear-gamergate-please-stop-stealing-our-shit/|title=Dear GamerGate: Please Stop Stealing Our Shit|date=29 October 2014|work=Motherboard|archive-url=https://archive.today/20190527064603/https://www.vice.com/en_us/article/ypw5mj/dear-gamergate-please-stop-stealing-our-shit|archive-date=27 May 2019|url-status=live|quote=There is no way for a website to protect itself from having an Archive.today user mirror the site.|access-date=22 March 2017}}{{cite web|title=Archive.today FAQ|url=https://archive.today/faq|website=archive.today|access-date=15 February 2019|language=en}} Since its beginning, it has supported crawling pages with URLs containing the now-deprecated hash-bang fragment ({{mono|#!}}).{{cite web|url=https://archive.is/|title=Home page of Archive.is in 2013|archive-url=https://web.archive.org/web/20130112221411/https://archive.is/|archive-date=12 January 2013|url-status=dead}}

Archive.today records only text and images, excluding XML, RTF, spreadsheet (xls or ods) and other non-static content. However, videos for certain sites, like X (formerly Twitter), are saved.{{cite web|title=Archive.today blog|url=https://blog.archive.today/post/657607767402659840/have-you-considered-allowing-small-mp4s-and-webms|url-status=live|archive-url=https://web.archive.org/web/20210907153549/https://blog.archive.today/post/657607767402659840/have-you-considered-allowing-small-mp4s-and-webms|archive-date=7 September 2021}} It keeps track of the history of snapshots saved, requesting confirmation before adding a new snapshot of an already saved page.{{Citation|title=Archiving Websites with the Archive.is| date=15 April 2016 |url=https://www.youtube.com/watch?v=LK_bp9_ZyQs|language=en|access-date=27 January 2022|archive-date=27 January 2022|archive-url=https://web.archive.org/web/20220127171624/https://www.youtube.com/watch?v=LK_bp9_ZyQs|url-status=live}}{{cite web|url=https://archive.today/https://support.google.com/webmasters/answer/6062608?hl=en|title=Example snapshot history on archive.is}}{{cbignore}}

Pages are captured at a browser width of 1,024 pixels. CSS is converted to inline CSS, removing responsive web design and selectors such as :hover and :active. Content generated using JavaScript during the crawling process appears in a frozen state.JavaScript-generated loading animation of Dailymotion video [https://archive.today/20200121182128/https://www.dailymotion.com/video/x3sexy8 appearing in a frozen state]

HTML class names are preserved inside the old-class attribute.

When text is selected, a JavaScript applet generates a URL fragment seen in the browser's address bar that automatically highlights that portion of the text when visited again.

Web pages can be duplicated from archive.today to web.archive.org as second-level backup, but archive.today does not save its snapshots in WARC format. The reverse—from web.archive.org to archive.today—is also possible,{{cite web|url=http://es.wikipedia.org/wiki/Wikipedia|archive-url=https://archive.today/20190324174341/https://web.archive.org/web/20130520191911/https://es.wikipedia.org/wiki/Wikipedia|url-status=dead|archive-date=24 March 2019|title=Example: Page saved from Web Archive to Archive.is|access-date=23 October 2019|language=es}} but the copy usually takes more time than a direct capture. Historically, website owners had the option to opt out of Wayback Machine through the use of the robots exclusion standard (robots.txt), and these exclusions were also applied retroactively.{{cite web|url=https://web.archive.org/collections/web/faqs.html#exclusions |title=FAQs - Some sites are not available because of Robots.txt or other exclusions. What does that mean? |website=Internet Archive Wayback Machine |archive-url=https://web.archive.org/web/20110415130934/https://web.archive.org/collections/web/faqs.html#exclusions |archive-date=15 April 2011}} Archive.today does not obey robots.txt because it acts "as a direct agent of the human user." As of 2019, Wayback Machine no longer obeys robots.txt.

The research toolbar enables advanced keywords operators, using {{code|*}} as the wildcard character. A couple of quotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the insite operator restricts it to a specific Internet domain.For example, the string insite: https://en.wikipedia.org "World Cup" returns the [https://archive.today/search/?q=insite%3A+http%3Aen.wikipedia.org+"World+Cup"/ related snapshots]

Once a web page is archived, it cannot be deleted directly by any Internet user.{{cite web|url=https://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page|title=Some Frequently Asked Question|date=24 January 2013|website=archive.is|format= blog|archive-url=https://web.archive.org/web/20130926093655/https://blog.archive.is/post/41395737942/how-can-i-delete-an-archived-page|archive-date=26 September 2013|url-status=live|access-date=12 November 2018}}

Removing advertisements, popups or expanding links from archived pages is possible by asking the owner to do it on his blog.{{Cite web|title=Example user request on the Archive.is blog|url=https://blog.archive.today/post/677427547064156160/could-you-expand-17wus-and-links-under-same|access-date=7 April 2022|website=Archive.is blog|archive-date=29 April 2022|archive-url=https://web.archive.org/web/20220429215629/https://blog.archive.today/post/677427547064156160/could-you-expand-17wus-and-links-under-same|url-status=live}}

While saving a dynamic list, archive.today search box shows only a result that links the previous and the following section of the list (e.g. 20 links for page).Example of dynamic list: {{cite web|url=https://www.worldcat.org/search?q=au%3A%22thomas+aquinas%22&fq=&dblist=638&start=21&qt=page_number_link|title=au:"thomas aquinas"|website=WorldCat|access-date=15 December 2018|archive-date=23 March 2019|archive-url=https://web.archive.org/web/20190323131756/https://www.worldcat.org/search?q=au%3A%22thomas+aquinas%22&fq=&dblist=638&start=21&qt=page_number_link|url-status=live}} The other web pages saved are filtered, and sometimes may be found by one of their occurrences.{{clarify|date=July 2022}}

The search feature is backed by Google CustomSearch. If it delivers no results, archive.today attempts to utilize Yandex Search.{{Cite web|title=Just realized that I can search for keywords in the search bar for archive today, was this a recently added feature?|url=https://blog.archive.today/post/673695282217762816/just-realized-that-i-can-search-for-keywords-in|date=18 January 2022|access-date=27 January 2022|website=Archive.is blog|archive-date=27 January 2022|archive-url=https://web.archive.org/web/20220127183557/https://blog.archive.today/post/673695282217762816/just-realized-that-i-can-search-for-keywords-in|url-status=live}}

While saving a page, a list of URLs for individual page elements and their content sizes, HTTP statuses and MIME types is shown. This list can only be viewed during the crawling process.{{fact|date=January 2025}}

Users can download archived pages as a ZIP file, except pages archived {{as of|2019|11|29|since=y|post=,|lc=y}}{{cite web|url=https://blog.archive.today/post/623883809154383872/the-download-zip-button-has-been-giving-a-not|title=The "download zip" button has been giving a "Not found" error for quite some time.|website=Archive.is blog|date=17 July 2020|url-status=live|archive-url=https://web.archive.org/web/20201003125618/https://blog.archive.today/post/623883809154383872/the-download-zip-button-has-been-giving-a-not|archive-date=3 October 2020}} when archive.today changed their browser engine from PhantomJS to Chromium (non-headless).{{cite web|url=https://blog.archive.today/post/618635148292964352/what-scraper-or-headless-browser-are-you-using-it|title=What scraper or headless browser are you using? it works so well.|website=Archive.is blog|accessdate=14 February 2025|date=20 May 2020|url-status=live|archive-url=https://web.archive.org/web/20200521161738/https://blog.archive.today/post/618635148292964352/what-scraper-or-headless-browser-are-you-using-it|archive-date=21 May 2020}}

In July 2013, Archive.today began supporting the API of the Memento Project.{{cite web|last1=Nelson|first1=Michael L.|url=https://ws-dl.blogspot.nl/2013/07/2013-07-09-archiveis-supports-memento.html|title=Archive.is Supports Memento|publisher=Web Science and Digital Libraries Research Group at Old Dominion University|work=Research and Teaching Updates|date=9 July 2013|archive-url=https://web.archive.org/web/20130727194715/https://ws-dl.blogspot.de/2013/07/2013-07-09-archiveis-supports-memento.html|archive-date=27 July 2013|url-status=live|access-date=17 September 2013|language=en}}{{cite web|url=https://mementoweb.org/depot/native/archiveis/|title=archive.is|website=Memento Protocol Information|publisher=Memento Development Group|access-date=17 September 2013|archive-url=https://web.archive.org/web/20130915191950/https://mementoweb.org/depot/native/archiveis/|archive-date=15 September 2013}}

Worldwide availability

= Australia and New Zealand =

{{see also|Internet censorship in Australia|Internet censorship in New Zealand}}

In March 2019, the site was blocked for six months by several internet providers in Australia and New Zealand in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack.{{cite web|title=ISPs in AU and NZ start censoring the internet without legal precedent|url=https://www.privateinternetaccess.com/blog/2019/03/isps-in-au-and-nz-start-censoring-the-internet-without-legal-precedent/|website=Private Internet Access|access-date=20 March 2019|date=19 March 2019|archive-date=28 April 2023|archive-url=https://web.archive.org/web/20230428152352/https://www.privateinternetaccess.com/blog/isps-in-au-and-nz-start-censoring-the-internet-without-legal-precedent/|url-status=live}}{{cite web|url=https://www.gizmodo.com.au/2019/03/new-zealand-isps-say-theyre-blocking-sites-that-fail-to-remove-christchurch-shooting-video/|title=New Zealand ISPs Say They're Blocking Sites That Fail To Remove Christchurch Shooting Video|date=19 March 2019|work=Gizmodo Australia|archive-url=https://web.archive.org/web/20190518223849/https://www.gizmodo.com.au/2019/03/new-zealand-isps-say-theyre-blocking-sites-that-fail-to-remove-christchurch-shooting-video/|archive-date=18 May 2019|url-status=live|access-date=20 March 2019}}

= China =

According to GreatFire.org, archive.today has been blocked in mainland China {{as of|2016|3|post=,|since=y|lc=y}}{{cite web|url=https://en.greatfire.org/archive.is|title=archive.is is 100% blocked in China|date=12 August 2018|website=GreatFire Analyzer|archive-url=https://archive.today/20180812150852/https://en.greatfire.org/https/archive.is|archive-date=12 August 2018|url-status=live}} archive.li {{as of|2017|9|post=,|since=y|lc=y}}{{cite web|url=https://en.greatfire.org/https/archive.li|title=archive.li is 100% blocked in China|date=12 August 2018|website=Great Fire Analyzer|archive-url=https://archive.today/20180812154815/https://en.greatfire.org/https/archive.li|archive-date=12 August 2018|url-status=live}} archive.fo {{as of|2018|7|since=y|lc=y|post=,}}{{cite web|url=https://en.greatfire.org/https/archive.fo|title=archive.fo is 100% blocked in China|date=12 August 2018|website=Great Fire Analyzer|archive-url=https://archive.today/20180812152220/https://en.greatfire.org/https/archive.fo|archive-date=12 August 2018|url-status=live}} as well as archive.ph {{as of|2019|12|post=.|since=y|lc=y}}{{Cite web|title=archive.ph is 100% blocked in China|url=https://en.greatfire.org/https/archive.ph|access-date=7 April 2022|website=en.greatfire.org|archive-date=29 April 2022|archive-url=https://web.archive.org/web/20220429215631/https://en.greatfire.org/https/archive.ph|url-status=live}}

= Finland =

On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government.{{cite web|url=https://www.iltalehti.fi/digi/a/2015072220070969|title=Suomalaisilta estettiin haktivistien suosimalla verkkosivulla käynti|last1=Lapintie|first1=Lassi|date=22 July 2015|work=Iltalehti|trans-title=Finns' access to website used by hacktivists blocked|archive-url=https://web.archive.org/web/20190527064017/https://www.iltalehti.fi/digi/a/2015072220070969|archive-date=27 May 2019|url-status=live|access-date=4 March 2016|language=fi}}

= Russia =

In 2016, the Russian communications agency Roskomnadzor began blocking access to archive.is from Russia.{{cite web|url=https://tjournal.ru/21966-roskomnadzor-zablokiroval-servis-archive-is-hranyashchiy-kopii-veb-saytov|script-title=ru:Роскомнадзор заблокировал сервис archive.is, хранящий копии веб-сайтов|last1=Elistratov|first1=Vladimir|date=29 January 2016|access-date=30 January 2016|website=TJournal|title=Roskomnadzor zablokiroval servis archive.is, khranyashchiy kopii veb-saytov|archive-url=https://web.archive.org/web/20170830055553/https://tjournal.ru/21966-roskomnadzor-zablokiroval-servis-archive-is-hranyashchiy-kopii-veb-saytov|archive-date=30 August 2017|url-status=live|language=ru}}{{cite web|url=https://www.techdirt.com/articles/20160203/08365233504/russia-blocks-another-archive-site-because-it-might-contain-old-pages-about-drugs.shtml|title=Russia Blocks Another Archive Site Because It Might Contain Old Pages About Drugs|last1=Cushing|first1=Tim|date=4 February 2016|work=Techdirt|archive-url=https://web.archive.org/web/20190323131754/https://www.techdirt.com/articles/20160203/08365233504/russia-blocks-another-archive-site-because-it-might-contain-old-pages-about-drugs.shtml|archive-date=23 March 2019|url-status=live|access-date=26 February 2016}}

Cloudflare DNS availability

Since May 2018{{Cite web|date=15 May 2018|title=Archive.is – Error 1001|url=https://community.cloudflare.com/t/archive-is-error-1001/18227|access-date=2 December 2021|website=Cloudflare Community|language=en|archive-date=2 December 2021|archive-url=https://web.archive.org/web/20211202024457/https://community.cloudflare.com/t/archive-is-error-1001/18227|url-status=live}}{{Cite web|date=3 March 2024|title=Archive.today & related sites failing again|url=https://community.cloudflare.com/t/archive-today-related-sites-failing-again/623376/|access-date=20 March 2024 |website=Cloudflare Community|language=en |url-status=live |archive-url= https://archive.today/20240403235704/https://community.cloudflare.com/t/archive-today-related-sites-failing-again/623376/1 |archive-date= 3 April 2024 }} Cloudflare's 1.1.1.1 DNS service would not resolve archive.today's web addresses, making it inaccessible to users of the Cloudflare DNS service. Both organizations claimed the other was responsible for the issue. Cloudflare staff stated that the problem was on archive.today's DNS infrastructure, as its authoritative nameservers return invalid records when Cloudflare's network systems made requests to archive.today. archive.today countered that the issue was due to Cloudflare requests not being compliant with DNS standards, as Cloudflare does not send EDNS Client Subnet information in its DNS requests.{{Cite tweet|user=archiveis|number=1018691421182791680|title='Having to do' is not so direct here. Absence of EDNS and massive mismatch (not only on AS/Country, but even on the continent level) of where DNS and related HTTP requests come from causes so many troubles so I consider EDNS-less requests from Cloudflare as invalid. |url-status=live |archive-url=https://web.archive.org/web/20230802171855/https://twitter.com/archiveis/status/1018691421182791680 |archive-date= 2 August 2023 }}{{Cite web|url=https://news.ycombinator.com/item?id=19828702|archive-url=https://archive.today/20220513070623/https://news.ycombinator.com/item?id=19828702|archive-date=13 May 2022|title=Comment by Matthew Prince on Hacker News|date=4 May 2019|website=Hacker News|access-date=4 October 2021}}

References

External links

{{Official website|https://archive.today/}}
[https://archive.today/faq FAQ] at Archive.today
[https://wiki.archiveteam.org/index.php/Archive.today archive.today] at Archive Team wiki
[https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/ "archive.today: On the trail of the mysterious guerrilla archivist of the Internet"], Gyrovague, 5 August 2023

Category:History of the Internet

Category:Internet properties established in 2012

Category:Tor onion services

Category:Web archiving initiatives