archive.today webpage capture | Saved from | ||
| All snapshots | from host archive.org | ||
| Linked from | archiveteam.org » Audit2014 archiveteam.org » Internet Archive archiveteam.org » Internet Archive/Collections en.wikipedia.org » Lists of Internet Archive's collections | ||
| WebpageScreenshot | |||
|
|
|
| Home | Donate | Store | Blog | FAQ | Jobs | Volunteer Positions |
| Anonymous User (login or join us) |
Upload
|
![]() |
20th Century Web Collection of web items from the 20th century. |
|
![]() |
Accelovation Crawl Web crawl snapshots generously donated from Accelovation. This data is currently not publicly accessible. From the site: Accelovation is pioneering the delivery of Insight Discovery™ software... |
|
![]() |
Alexa Crawls Crawl data donated by Alexa Internet. This data is currently not publicly accessible. Decryption Keys are kept in an item. Alexa is the leading provider of free, global web metrics. Search Alexa to... |
|
![]() |
Archive Team Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the... |
|
![]() |
Archive-It Digital Collection The Archive-It Digital Collection |
|
![]() |
Away From Keyboard Away From Keyboard is a memorial collection dedicated to preserving pieces of lives lived online from being scattered and lost. While no collection of data can ever replace a person, these archives... |
|
![]() |
collections-aaron-swartz from Wikipedia: Aaron Hillel Swartz (November 8, 1986 – January 11, 2013) was an American computer programmer, writer, political organizer and Internet activist. Swartz was involved in the... |
|
![]() |
Common Crawl Web crawl data from Common Crawl. |
|
![]() |
Cuil Crawl Data Web crawl snapshot generously donated from cuil.com. This collection of pages mostly from 2007 and some from 2008, is about 310 terabytes of compressed data, and almost 60 billion URLs (mostly text).... |
|
![]() |
Custom Crawl Services National library harvesting. |
|
Fix Broken Links Web Crawls These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link... |
||
![]() |
Focused Crawls Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain. |
|
![]() |
httparchive Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In... |
|
![]() |
Institut national de l’audiovisuel Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia: The Institut national de l'audiovisuel (or INA, French for National... |
|
![]() |
Internet Archive Web Crawls Crawl data collected by the Internet Archive. This data is currently not publicly accessible in this format. To view archived web pages, please visit the Wayback Machine. |
|
![]() |
Internet Memory Foundation Data crawled on behalf of Internet Memory Foundation. This data is currently not publicly accessible. from Wikipedia: The Internet Memory Foundation (formerly the European Archive Foundation) is a... |
|
Mercator Crawl Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible. |
||
![]() |
Rescue Crawls Rescue crawls conducted by the public for sites that have announced that they are closing. |
|
Thumper Transfer Web crawl data transferred from thumpers in Santa Clara data center. |
||
![]() |
urlteam Web Crawls Crawl data collected by the urlteam. The URLTeam is the ArchiveTeam subcommittee on URL shorteners. We believe that they pose a serious threat to the internet's integrity. If one of them dies, gets... |
|
![]() |
Web Collections Web Collections organized by year. Some of this data is currently not publicly accessible. |
|
web-group-internal miscellaneous data |
||
![]() |
Wiki Collections Collections of Wiki data |
|
Wikileaks.org Archive A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs,... |
![]() |
| Archive Team: The Twitter Stream Grab |