Home	Donate \| Store \| Blog \| FAQ \| Jobs \| Volunteer Positions

Anonymous User (login or join us)

Upload

Web Crawls

Most Downloaded Items Last Week more

Archive Team Webshots Index
376 downloads
wbsrv-0084-1
233 downloads
wbsrv-0046-0
217 downloads
wbsrv-0225-1
217 downloads
wbsrv-0280-1
216 downloads

Most Downloaded Items more

Liveweb Capture 2011-03-27T22:10:09PDT to 2011-03-28T05:27:05PDT
6,167,345 downloads
Webwide Crawldata 2012-01-17T08:02:53PST to 2012-01-17T01:16:20PST
2,890,097 downloads
Crawldata from Internet Archive from 2002-11-01T06:23:33PDT to 2002-11-19T23:24:07PDT
2,504,552 downloads
Liveweb Capture 2011-05-08T07:07:52PDT to 2011-05-08T08:00:29PDT
2,483,180 downloads
Crawldata from Alexa Internet from 2001-06-02T21:47:31PDT to 2001-06-03T08:08:36PDT
2,465,395 downloads

Spotlight Item

Liveweb Capture 2011-03-27T22:10:09PDT to 2011-03-28T05:27:05PDT
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.

About the Archive

Background

Frequently Asked Questions

2,568,102 itemsWelcome to Web Crawls

The Web Archive of the Internet Archive started in late 1996 is made available through the Wayback Machine, and some collections are available in bulk to researchers.

Other than the pages collected by the Internet Archive, major contributors include Alexa Internet, Cuil, and those listed below.

All items (most recently added first)

Sub-Collections

	20th Century Web Collection of web items from the 20th century.	331 items
	Accelovation Crawl Web crawl snapshots generously donated from Accelovation. This data is currently not publicly accessible. From the site: Accelovation is pioneering the delivery of Insight Discovery™ software...	1,321 items
	Alexa Crawls Crawl data donated by Alexa Internet. This data is currently not publicly accessible. Decryption Keys are kept in an item. Alexa is the leading provider of free, global web metrics. Search Alexa to...	106,754 items
	Archive Team Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the...	26,256 items
	Archive-It Digital Collection The Archive-It Digital Collection	92,771 items
	Away From Keyboard Away From Keyboard is a memorial collection dedicated to preserving pieces of lives lived online from being scattered and lost. While no collection of data can ever replace a person, these archives...	295 items
	collections-aaron-swartz from Wikipedia: Aaron Hillel Swartz (November 8, 1986 – January 11, 2013) was an American computer programmer, writer, political organizer and Internet activist. Swartz was involved in the...	3 items
	Common Crawl Web crawl data from Common Crawl.	439 items
	Cuil Crawl Data Web crawl snapshot generously donated from cuil.com. This collection of pages mostly from 2007 and some from 2008, is about 310 terabytes of compressed data, and almost 60 billion URLs (mostly text)....	26,386 items
	Custom Crawl Services National library harvesting.	31,332 items
	Fix Broken Links Web Crawls These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link...	9,542 items
	Focused Crawls Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.	58,997 items
	httparchive Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In...	874 items
	Institut national de l’audiovisuel Crawl data from Institut national de l’audiovisuel in France. This data is currently not publicly accessible. from Wikipedia: The Institut national de l'audiovisuel (or INA, French for National...	50 items
	Internet Archive Web Crawls Crawl data collected by the Internet Archive. This data is currently not publicly accessible in this format. To view archived web pages, please visit the Wayback Machine.	520,449 items
	Internet Memory Foundation Data crawled on behalf of Internet Memory Foundation. This data is currently not publicly accessible. from Wikipedia: The Internet Memory Foundation (formerly the European Archive Foundation) is a...	59 items
	Mercator Crawl Crawl done with the DEC/HP-labs 'Mercator' crawler and converted to ARC format. This data is currently not publicly accessible.	1 items
	perma_cc	1 items
	Rescue Crawls Rescue crawls conducted by the public for sites that have announced that they are closing.	2 items
	Thumper Transfer Web crawl data transferred from thumpers in Santa Clara data center.
	urlteam Web Crawls Crawl data collected by the urlteam. The URLTeam is the ArchiveTeam subcommittee on URL shorteners. We believe that they pose a serious threat to the internet's integrity. If one of them dies, gets...	4 items
	Web Collections Web Collections organized by year. Some of this data is currently not publicly accessible.	20 items
	web-group-internal miscellaneous data	28,207 items
	Wiki Collections Collections of Wiki data	172,986 items
	Wikileaks.org Archive A collection of web pages from the wikileaks websites as well as news coverage and commentary surrounding the Wikileaks releases. It includes coverage of the Afghan war diaries, the Iraq war logs,...	8 items

Related Collections

Archive Team: The Twitter Stream Grab

31 items

Recently Reviewed Items (more)

ArchiveTeam JSON Download of Twitter Stream 2014-01
Average rating:

AIT-1842 Crawldata 2010-08-13T12:26:01PDT to 2010-12-22T16:02:08PST
Average rating:

AIT-1842 Crawldata 2010-08-15T22:38:46PDT to 2010-04-07T20:22:13PDT
Average rating:

AIT-1850 Crawldata 2010-08-31T15:29:50PDT to 2010-03-20T10:30:57PDT
Average rating:

Archive Team DailyBooth Index
Average rating:

This Just In (more)

Webwide Crawldata 2014-08-22T14:50:39PDT to 2014-08-22T09:08:49PDT
2 hours ago

Webwide Crawldata 2014-08-22T13:05:29PDT to 2014-08-22T08:45:07PDT
2 hours ago

YouTube Video Crawldata 2014-08-22T21:51:43PDT to 2014-08-22T15:06:34PDT
2 hours ago

Webwide Crawldata 2014-08-22T17:59:50PDT to 2014-08-22T12:37:38PDT
2 hours ago

Webwide Crawldata 2014-08-23T01:10:16PDT to 2014-08-22T20:39:01PDT
2 hours ago

New PostWayback Machine Forum

Subject	Poster	Replies	Date
Carols in the Domain	danielcelano	0	Aug 21, 2014 3:10pm
The Wayback Machine is down... again!	angeldeb82	0	Aug 20, 2014 10:45am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 10:05am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
Company asks $3000,- for disabling robots.txt	Cold Case Team	0	Aug 16, 2014 5:31am
You add my website	kaann	0	Aug 11, 2014 2:04pm
why are archives changing from being accessible to inaccessible in a 24 hr. period?	lancer46	0	Jul 11, 2014 9:48pm
Can't access the archive just a while ago	webgaulforum	0	Jun 23, 2014 9:43am
Pages being wiped from archive when thet should	Strawfellow	0	Jun 12, 2014 12:49pm
delete request	HenkSG	0	May 24, 2014 10:04am
I can't go to the Wayback Machine on some links	angeldeb82	0	May 23, 2014 9:21pm
Zip Corrupted by Wayback Machine Banner	mellamokb	1	Apr 24, 2014 11:41am
Re: Zip Corrupted by Wayback Machine Banner	DFJustin	0	May 27, 2014 6:26pm
How to archive videos?	41553	0	Apr 22, 2014 3:42pm
How can I delete an archived webpage?	456123	0	Apr 15, 2014 2:36am
Adding More than one page at a time	112288	1	Apr 5, 2014 7:18am
Re: Adding More than one page at a time	chfoo	1	Apr 6, 2014 2:27pm
Re: Adding More than one page at a time	112288	1	Apr 6, 2014 3:36pm
Re: Adding More than one page at a time	chfoo	1	Apr 6, 2014 3:48pm
Re: Adding More than one page at a time	112288	0	Apr 6, 2014 4:38pm
HRRC.org Restoring a DMCA Resource	sriplaw	0	Mar 19, 2014 7:12am
Google's robots.txt rules interpreted too strictly by Wayback machine	Nemo_bis	0	Mar 11, 2014 4:15am
cara menghilangkan jerawat archieves	hadingrh	2	Feb 20, 2014 6:24pm
Re: cara menghilangkan jerawat archieves	41553	0	Apr 22, 2014 3:44pm
Re: cara menghilangkan jerawat archieves	priyadi88	0	May 23, 2014 9:10am
Javascript messing up archived page	onlynone	0	Jan 23, 2014 9:41am
Page served up as raw chunked transfer encoding	chfoo	0	Jan 15, 2014 9:34pm
Probably and old question - downloading an entire site	Britwar	1	Jan 7, 2014 2:10pm
Re: Probably and old question - downloading an entire site	Nemo_bis	1	Jan 23, 2014 9:43am
Re: Probably and old question - downloading an entire site	Britwar	0	Jan 23, 2014 4:07pm

View more forum posts

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

archive.today webpage capture	Saved from		23 Aug 2014 08:11:44 UTC
	All snapshots	from host archive.org
	Linked from	archiveteam.org » Audit2014 archiveteam.org » Internet Archive archiveteam.org » Internet Archive/Collections en.wikipedia.org » Lists of Internet Archive's collections
Webpage Screenshot
		share download .zip report bug or abuse Buy me a coffee

Download & Streaming : Web Crawls : Internet Archive