Download & Streaming : Web Crawls : Internet Archive

archived 11 Sep 2016 17:24:01 UTC
Skip to main content
Search the history of over 505 billion pages on the Internet.
Wayback Machine

Web Crawls

The Web Archive of the Internet Archive started in late 1996, is made available through the Wayback Machine, and some collections are available in bulk to researchers. Many pages are archived by the Internet Archive for other contributors including partners of Archive-IT, and Save Page Now users. Other captures are donated to the Internet Archive by other partners such as Alexa Internet.

2,606,220
RESULTS
rss


TOPIC atoz
crawldata 978,784
wiki 656,154
dumps 629,720
incremental 598,348
Wikipedia 239,731
Wiktionary 125,421
Wikibooks 66,008
archiveteam 62,900
Wikiquote 57,099
Wikisource 53,676
Wikimedia 34,782
data dumps 31,351
no404 26,701
wikiteam 26,534
MediaWiki 26,466
Wikinews 24,873
English 23,996
videobot 22,870
live 18,922
livestream 18,918
stream 18,918
Wikivoyage 14,227
wikipedia 13,693
Wikiversity 12,796
wordpress 12,584
tv 8,017
Italian 6,836
French 6,834
Greek 6,834
German 6,832
Spanish 6,832
Swedish 6,798
Portuguese 6,795
Russian 6,795
Arabic 5,990
Czech 5,990
Japanese 5,989
Finnish 5,987
Korean 5,984
Hebrew 5,981
Ukrainian 5,956
Polish 5,948
Romanian 5,944
Chinese 5,922
Persian 5,867
television 5,408
TV 5,258
Dutch 5,142
Bosnian 5,136
Catalan 5,136
Bulgarian 5,135
Esperanto 5,133
Norwegian 5,110
Serbian 5,106
Tamil 5,105
Turkish 5,105
Vietnamese 5,104
Slovenian 5,102
gizmodo.com 4,511
gawker.com 4,350
deadspin.com 4,346
jalopnik.com 4,334
Hungarian 4,311
Thai 4,293
Welsh 4,282
Azerbaijani 4,281
Croatian 4,281
Lithuanian 4,281
Belarusian 4,280
Estonian 4,280
Limburgish 4,280
Armenian 4,279
Galician 4,279
Marathi 4,279
Malayalam 4,278
Danish 4,277
Indonesian 4,277
Latin 4,277
Icelandic 4,273
Telugu 4,258
Albanian 4,255
Slovak 4,253
Sanskrit 4,252
WARC 3,943
archive 3,907
snapshot 3,894
Arcmaj3 3,862
media 3,496
tape 3,494
Gujarati 3,458
Kannada 3,458
Breton 3,426
Georgian 3,426
Basque 3,425
Bengali 3,425
Hindi 3,425
Afrikaans 3,424
Kyrgyz 3,424
Macedonian 3,423
Kurdish 3,422
Urdu 3,404
kotaku.com 3,267
website 3,030
research 2,901
metro.co.uk 2,897
european 2,875
forum 2,875
parliament 2,875
plenary 2,875
session 2,875
web archive 2,870
university 2,856
george 2,855
gmu-tv 2,855
gmutv 2,855
mason 2,855
North Korea 2,733
KCTV 2,732
24 2,679
austria 2,679
austria24 2,679
education 2,679
science 2,678
health 2,677
medicine 2,676
humanities 2,674
UCSD 2,673
UCSD-TV 2,673
UCTV 2,673
arts 2,673
san diego 2,673
satellite 2,673
drenthe 2,663
dutch 2,663
nederlands 2,663
rtv 2,663
london 2,654
Kazakh 2,603
Uzbek 2,597
Sundanese 2,595
Tatar 2,591
bridge 2,590
tower 2,590
FIX 2,583
hungarian 2,583
hungary 2,583
Faroese 2,571
Malagasy 2,571
Interlingua 2,570
Khmer 2,569
Malay 2,569
Nepali 2,565
Occitan 2,560
Wolof 2,557
Yiddish 2,557
Tajik 2,556
Tagalog 2,555
Punjabi 2,554
Sinhala 2,552
Venetian 2,550
Oriya 2,424
2015 2,135
archivebot 2,082
Old English 2,033
Interlingue 1,965
Asturian 1,785
Corsican 1,785
Irish 1,784
Nauru 1,783
Kashmiri 1,781
Low German 1,780
Assamese 1,778
Quechua 1,774
Uyghur 1,773
Turkmen 1,772
Amharic 1,756
Aymara 1,750
Guarani 1,750
Latvian 1,750
Lingala 1,750
LANGUAGE
English 102,251
Portuguese 6,792
German 4,329
Dutch 2,965
Korean 2,796
Hungarian 2,730
Spanish 960
Russian 889
French 719
Chinese 248
SHOW DETAILS
up-solid
down-solid
eye
Title
Date Archived
Creator
5.9B 5.9B
Internet Archive Web Crawls
collection
790,449
ITEMS
5.9B
VIEWS
Jun 11, 2010 06/10
collection
eye 5.9B
The Internet Archive discovers and captures web pages through many different web crawls. At any given time several distinct crawls are running, some for months, and some every day or longer. View the web archive through the Wayback Machine.
Topic: webwidecrawl
2.6B 2.6B
Alexa Crawls
collection
137,604
ITEMS
2.6B
VIEWS
Nov 16, 2010 11/10
collection
eye 2.6B
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Topics: web crawl, Alexa
2.4B 2.4B
Worldwide Web Crawls
collection
421,501
ITEMS
2.4B
VIEWS
Oct 5, 2010 10/10
collection
eye 2.4B
Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites. Since September 10th, 2010, the Internet Archive has been running Worldwide Web Crawls of the global web, capturing web elements, pages, sites and parts of sites. Each Worldwide Web Crawl was initiated from one or more lists of URLs that are known as "Seed Lists". Descriptions of the Seed Lists associated with each crawl may be provided as part of the metadata for...
1.3B 1.3B
Survey Crawls
collection
63,876
ITEMS
1.3B
VIEWS
Nov 17, 2012 11/12
collection
eye 1.3B
Survey crawls are run about twice a year, on average, and attempt to capture the content of the front page of every web host ever seen by the Internet Archive since 1996.
Topic: survey crawls
1.3B 1.3B
Live Web Proxy Crawls
collection
13,609
ITEMS
1.3B
VIEWS
Apr 26, 2011 04/11
collection
eye 1.3B
Content crawled via the Wayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org. Liveweb proxy is a component of Internet Archive’s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.
553M 553M
Archive-It Digital Collection
collection
207,093
ITEMS
553M
VIEWS
Dec 14, 2010 12/10
collection
eye 553M
Archive-It is a subscription web archiving service of the Internet Archive that helps organizations harvest, build, and preserve collections of digital content. Partners create domain specific collections of web captures that can be searched on Archive It. Content is hosted and stored at the Internet Archive data centers. Archive-It works with more than 400 partner organizations in 48 U.S. states and 16 countries worldwide including: College and University Libraries State Archives, Libraries,...
Topic: Colleges, Universities, Libraries, Archives, NGOs, Museums
477.5M 478M
Survey Crawl April 2013
collection
16,282
ITEMS
477.5M
VIEWS
Nov 17, 2012 11/12
collection
eye 477.5M
Survey crawl of domains started April 2013. This data is currently not publicly accessible.
467.3M 467M
Focused Crawls
collection
125,541
ITEMS
467.3M
VIEWS
Nov 4, 2011 11/11
by Internet Archive
collection
eye 467.3M
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Topic: webcrawl
398.9M 399M
Custom Crawl Services
collection
46,206
ITEMS
398.9M
VIEWS
Apr 8, 2011 04/11
by Internet Archive
collection
eye 398.9M
National library harvesting.
Topic: ccs
398.3M 398M
web-group-internal
collection
29,828
ITEMS
398.3M
VIEWS
Jul 21, 2011 07/11
collection
eye 398.3M
miscellaneous data
Topic: brad tofel
359.5M 360M
Wide Crawl started April 2013
collection
25,005
ITEMS
359.5M
VIEWS
Apr 18, 2013 04/13
collection
eye 359.5M
Web wide crawl with initial seedlist and crawler configuration from April 2013.
344.4M 344M
Wayback Indexes
collection
554
ITEMS
344.4M
VIEWS
Apr 4, 2012 04/12
collection
eye 344.4M
Wayback indexes. This data is currently not publicly accessible.
312.7M 313M
Top Domains
collection
68,309
ITEMS
312.7M
VIEWS
Nov 29, 2011 11/11
collection
eye 312.7M
A daily collection of thousands of the most popular web sites according to Alexa.com's top sites rankings.
Topics: daily, popular sites, Alexa
311.9M 312M
Archive-It Partners
collection
127,831
ITEMS
311.9M
VIEWS
Oct 20, 2015 10/15
collection
eye 311.9M
Archive-It is the leading web archiving service for collecting and accessing cultural heritage on the web and is a service of Internet Archive used by libraries, archives, governments, non-profits, and other organizations to build collections of web materials.
Topic: TK
304.9M 305M
Fix Broken Links Web Crawls
collection
45,006
ITEMS
304.9M
VIEWS
Sep 12, 2013 09/13
collection
eye 304.9M
These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved. Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's...
302.9M 303M
alexa_2007
collection
7,636
ITEMS
302.9M
VIEWS
Jul 12, 2012 07/12
collection
eye 302.9M
this data is currently not publicly accessible.
277.1M 277M
Survey Crawl December 2014
collection
11,190
ITEMS
277.1M
VIEWS
Dec 17, 2014 12/14
collection
eye 277.1M
Survey crawl of domains started December 2014. This data is currently not publicly accessible.
233.1M 233M
Wide Crawl started June 2014
collection
45,313
ITEMS
233.1M
VIEWS
Jun 6, 2014 06/14
collection
eye 233.1M
Web wide crawl with initial seedlist and crawler configuration from June 2014.
223.1M 223M
Wide Crawl started August 2013
collection
21,909
ITEMS
223.1M
VIEWS
Jul 30, 2013 07/13
collection
eye 223.1M
Web wide crawl with initial seedlist and crawler configuration from August 2013.
217.2M 217M
alexa_2006
collection
6,507
ITEMS
217.2M
VIEWS
Jul 12, 2012 07/12
collection
eye 217.2M
this data is currently not publicly accessible.
214.1M 214M
Wide Crawl started January 2012
collection
30,362
ITEMS
214.1M
VIEWS
Dec 30, 2011 12/11
collection
eye 214.1M
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
206.8M 207M
Wiki Collections
collection
727,346
ITEMS
206.8M
VIEWS
Apr 15, 2013 04/13
collection
eye 206.8M
Collections of Wiki data
Topics: crawls, data, wiki
203.7M 204M
Wikipedia Outlinks
collection
12,403
ITEMS
203.7M
VIEWS
May 13, 2011 05/11
collection
eye 203.7M
Crawl of outlinks from wikipedia.org. These files are currently not publicly accessible. from Wikipedia: Wikipedia is a multilingual, web-based, free-content encyclopedia project operated by the Wikimedia Foundation and based on an openly editable model. The name "Wikipedia" is a portmanteau of the words wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning "quick") and encyclopedia. Wikipedia's articles provide links to guide the user...
201M 201M
Archive Team
collection
127,680
ITEMS
201M
VIEWS
May 4, 2011 05/11
collection
eye 201M
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history. History is littered with hundreds of conflicts over the future of a community, group, location or...
194.9M 195M
Wide Crawl started April 2012
collection
39,252
ITEMS
194.9M
VIEWS
Mar 31, 2012 03/12
collection
eye 194.9M
Web wide crawl with initial seedlist and crawler configuration from April 2012.
193.4M 193M
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
193.4M
VIEWS
Jan 9, 2015 01/15
collection
eye 193.4M
Web wide crawl with initial seedlist and crawler configuration from January 2015.
162.2M 162M
Wikipedia Outbound Links
collection
12,730
ITEMS
162.2M
VIEWS
Sep 23, 2013 09/13
collection
eye 162.2M
This is a collection of web page captures from links added to, or changed on, Wikipedia pages. The idea is to bring a reliability to Wikipedia outlinks so that if the pages referenced by Wikipedia articles are changed, or go away, a reader can permanently find what was originally referred to. This is part of the Internet Archive's attempt to rid the web of broken links.
Topics: Wikipedia, Wikimedia
146.9M 147M
Survey Crawl started July 2015
collection
10,137
ITEMS
146.9M
VIEWS
Jan 9, 2015 01/15
collection
eye 146.9M
Survey crawl of domains. This data is currently not publicly accessible.
143.1M 143M
Survey Crawl May 2014
collection
6,909
ITEMS
143.1M
VIEWS
Apr 25, 2014 04/14
collection
eye 143.1M
Survey crawl of domains started May 2014. This data is currently not publicly accessible.
142.3M 142M
Wide Crawl started October 2010
collection
15,839
ITEMS
142.3M
VIEWS
Oct 5, 2010 10/10
collection
eye 142.3M
Web wide crawl with initial seedlist and crawler configuration from October 2010
139.2M 139M
Wide Crawl Started January 2013
collection
15,138
ITEMS
139.2M
VIEWS
Jan 1, 2013 01/13
collection
eye 139.2M
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
138.9M 139M
Wide Crawl started September 2012
collection
22,402
ITEMS
138.9M
VIEWS
Aug 24, 2012 08/12
collection
eye 138.9M
Web wide crawl with initial seedlist and crawler configuration from September 2012.
132.7M 133M
Around The World Crawl
collection
2,150
ITEMS
132.7M
VIEWS
Jul 16, 2012 07/12
collection
eye 132.7M
Data crawled by Sloan Foundation on behalf of Internet Archive
125M 125M
Wide Crawl started October 2011
collection
10,122
ITEMS
125M
VIEWS
Sep 30, 2011 09/11
collection
eye 125M
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
124.1M 124M
Survey Crawl
collection
12,622
ITEMS
124.1M
VIEWS
Jan 9, 2015 01/15
collection
eye 124.1M
Survey crawl of domains. This data is currently not publicly accessible.
119.8M 120M
Top News
collection
48,925
ITEMS
119.8M
VIEWS
Nov 29, 2011 11/11
collection
eye 119.8M
A daily collection of hundreds of the world's top news sites.
Topics: daily, news
117.9M 118M
ArchiveBot: The Archive Team Crowdsourced Crawler
collection
1,706
ITEMS
117.9M
VIEWS
Apr 8, 2014 04/14
collection
eye 117.9M
ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites). To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel...
Topics: archiveteam, archivebot, webcrawl, robot, love
117.6M 118M
.com survey started January 2011
collection
2,535
ITEMS
117.6M
VIEWS
Jan 20, 2011 01/11
collection
eye 117.6M
Survey crawl of .com domains started January 2011.
Topic: webcrawl
112.2M 112M
Wide Crawl started March 2011
collection
8,528
ITEMS
112.2M
VIEWS
Oct 5, 2010 10/10
collection
eye 112.2M
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
106.8M 107M
Wide Crawl started February 2014
collection
9,789
ITEMS
106.8M
VIEWS
Nov 15, 2013 11/13
collection
eye 106.8M
Web wide crawl with initial seedlist and crawler configuration from February 2014.
105.5M 106M
Wide Crawl Number 13
collection
46,049
ITEMS
105.5M
VIEWS
Jan 9, 2015 01/15
collection
eye 105.5M
Web Wide Crawl Number 13
101.2M 101M
38_crawl
collection
1,387
ITEMS
101.2M
VIEWS
Jul 12, 2012 07/12
collection
eye 101.2M
this data is currently not publicly accessible.
97M 97M
alexa_web_2009
collection
3,080
ITEMS
97M
VIEWS
Jul 12, 2012 07/12
collection
eye 97M
this data is currently not publicly accessible.
94.1M 94M
alexa_web_2010
collection
2,994
ITEMS
94.1M
VIEWS
Jul 12, 2012 07/12
collection
eye 94.1M
this data is currently not publicly accessible.
89.8M 90M
Wordpress Blogs and the Pages They Link To
collection
12,583
ITEMS
89.8M
VIEWS
Sep 11, 2013 09/13
collection
eye 89.8M
This is a collection of pages and embedded objects from WordPress blogs and the external pages they link to. Captures of these pages are made on a continuous basis seeded from a feed of new or changed pages hosted by Wordpress.com or by Wordpress pages hosted by sites running a properly configured Jetpack wordpress plugin.
Topics: Wordpress.com, blogs, jetpack
88.2M 88M
Wikipedia Outlinks February 2012
collection
2,951
ITEMS
88.2M
VIEWS
Feb 3, 2012 02/12
collection
eye 88.2M
Crawl of outlinks from wikipedia.org started February, 2012. These files are currently not publicly accessible.
88.2M 88M
National Library of Australia Crawls
collection
11,022
ITEMS
88.2M
VIEWS
Apr 3, 2012 04/12
collection
eye 88.2M
Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
86.2M 86M
Alexa Crawl EG
collection
1,678
ITEMS
86.2M
VIEWS
Apr 3, 2012 04/12
collection
eye 86.2M
Crawl EG from Alexa Internet. This data is currently not publicly accessible.
81.2M 81M
Wide Crawl Number 14 started March 2016
collection
34,127
ITEMS
81.2M
VIEWS
Mar 4, 2016 03/16
collection
eye 81.2M
Web wide crawl.
81.1M 81M
web_iq
collection
2,650
ITEMS
81.1M
VIEWS
Apr 11, 2012 04/12
collection
eye 81.1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
80.1M 80M
web_wk
collection
9,978
ITEMS
80.1M
VIEWS
Apr 17, 2012 04/12
collection
eye 80.1M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
75.9M 76M
National Library of Spain
collection
6,722
ITEMS
75.9M
VIEWS
Apr 5, 2012 04/12
collection
eye 75.9M
Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
65.5M 66M
26_crawl
collection
1,466
ITEMS
65.5M
VIEWS
Jul 12, 2012 07/12
collection
eye 65.5M
this data is currently not publicly accessible.
63.7M 64M
51_crawl
collection
1,138
ITEMS
63.7M
VIEWS
Jul 12, 2012 07/12
collection
eye 63.7M
this data is currently not publicly accessible.
59.6M 60M
52_crawl
collection
2,589
ITEMS
59.6M
VIEWS
Jul 12, 2012 07/12
collection
eye 59.6M
this data is currently not publicly accessible.
55.8M 56M
Bibliotheque Nationale de France Domain Crawls
collection
1,652
ITEMS
55.8M
VIEWS
Apr 3, 2012 04/12
collection
eye 55.8M
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
50.7M 51M
35_crawl
collection
1,179
ITEMS
50.7M
VIEWS
Jul 12, 2012 07/12
collection
eye 50.7M
this data is currently not publicly accessible.
48.3M 48M
Shallow Crawls
collection
1,042
ITEMS
48.3M
VIEWS
Nov 17, 2012 11/12
collection
eye 48.3M
Shallow crawls that collect content 1 level deep including embeds. This data is currently not publicly accessible.
47.4M 47M
Alexa Crawls DF
collection
248
ITEMS
47.4M
VIEWS
Jul 11, 2012 07/12
collection
eye 47.4M
Crawl data donated by Alexa Internet. This data is currently not publicly accessible
46.1M 46M
Alexa Crawl EI
collection
1,408
ITEMS
46.1M
VIEWS
Apr 3, 2012 04/12
collection
eye 46.1M
Crawl EI from Alexa Internet. This data is currently not publicly accessible.
45.9M 46M
Wikipedia Outlinks March 2016
collection
10,320
ITEMS
45.9M
VIEWS
Mar 3, 2016 03/16
collection
eye 45.9M
Crawl of outlinks from wikipedia.org started March, 2016. These files are currently not publicly accessible. Properties of this collection. It has been several years since the last time we did this. For this collection, several things were done: 1. Turned off duplicate detection. This collection will be complete, as there is a good chance we will share the data, and sharing data with pointers to random other collections, is a complex problem. 2. For the first time, did all the different wikis....
44.5M 45M
alexa_1999
collection
243
ITEMS
44.5M
VIEWS
Jul 12, 2012 07/12
collection
eye 44.5M
this data is currently not publicly accessible.
44.2M 44M
International News Crawls
collection
3,581
ITEMS
44.2M
VIEWS
Oct 4, 2010 10/10
collection
eye 44.2M
Crawls of International News Sites
43.8M 44M
Alexa Crawl DX
collection
1,442
ITEMS
43.8M
VIEWS
Apr 3, 2012 04/12
collection
eye 43.8M
Crawl DX from Alexa Internet. This data is currently not publicly accessible.
43M 43M
29_crawl
collection
1,568
ITEMS
43M
VIEWS
Jul 12, 2012 07/12
collection
eye 43M
this data is currently not publicly accessible.
42.9M 43M
web_el_2008
collection
1,705
ITEMS
42.9M
VIEWS
Jul 26, 2012 07/12
collection
eye 42.9M
This data is currently not publicly accessible.
41.7M 42M
Alexa Crawls DO
collection
493
ITEMS
41.7M
VIEWS
Jul 11, 2012 07/12
collection
eye 41.7M
Crawl data donated by Alexa Internet. This data is currently not publicly accessible
41.5M 41M
web_mon
collection
3,810
ITEMS
41.5M
VIEWS
Apr 11, 2012 04/12
collection
eye 41.5M
Crawl performed by Internet Archive. This data is currently not publicly accessible.
41M 41M
Wikipedia Outlinks May 2011
collection
1,638
ITEMS
41M
VIEWS
Jul 11, 2011 07/11
collection
eye 41M
Crawl of outlinks from wikipedia.org started May, 2011. These files are currently not publicly accessible.
40.3M 40M
Alexa Crawls EA
collection
1,315
ITEMS
40.3M
VIEWS
Jul 12, 2012 07/12
collection
eye 40.3M
Crawl data donated by Alexa Internet. This data is currently not publicly accessible
Topic: crawldata
40.2M 40M
Alexa Crawls DY
collection
1,326
ITEMS
40.2M
VIEWS
Jul 12, 2012 07/12
collection
eye 40.2M
Crawl data donated by Alexa Internet. This data is currently not publicly accessible
39.7M 40M
Internet Archive Global Events
collection
7,118
ITEMS
39.7M
VIEWS
Jun 21, 2011 06/11
collection
eye 39.7M
Internet Archive Global EventsArchive-It Partner Since: Feb, 2006Organization Type: Other InstitutionsOrganization URL:http://www.archive-it.org
39.5M 40M
20th Century Web
collection
331
ITEMS
39.5M
VIEWS
Jan 17, 2014 01/14
collection
eye 39.5M
Collection of web items from the 20th century.
Topics: web, 20th century
38.5M 39M
Elections Web
collection
1,609
ITEMS
38.5M
VIEWS
Oct 20, 2012 10/12
collection
eye 38.5M
This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
38.5M 39M
Election Crawl 2012
collection
1,608
ITEMS
38.5M
VIEWS
Oct 20, 2012 10/12
collection
eye 38.5M
This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
MORE RESULTS
Fetching more results
DESCRIPTION
The Web Archive of the Internet Archive started in late 1996, is made available through the Wayback Machine, and some collections are available in bulk to researchers. Many pages are archived by the Internet Archive for other contributors including partners of Archive-IT, and Save Page Now users. Other captures are donated to the Internet Archive by other partners such as Alexa Internet.
Created on
October 8
2010
tracey pooh
Archivist
ADDITIONAL CONTRIBUTORS
ARossi
Archivist
kngenie
Archivist
Aaron Ximm
Archivist
Ralf Muehlen
Archivist
jcushman
Archivist
hawc
Archivist
VIEWS

10,921,976,709

ITEMS

5,151,018

RELATED COLLECTIONS
collection

New PostWayback Machine Forum (closed) email rss RSS

Subject Poster Replies Date
Site Removal Please MGMidget1234 0 Jun 9, 2016 8:57am Jun 9, 2016 8:57am
Site Removal Request 4687431212 1 May 1, 2016 8:23pm May 1, 2016 8:23pm
   Re: Site Removal Request 4687431212 0 May 1, 2016 10:41pm May 1, 2016 10:41pm
Takedown request victorlsxiv 0 Apr 24, 2016 7:00am Apr 24, 2016 7:00am
only two hours left of April20 (420): everybody Wayback cannabis homepages EarthFurst 2 Apr 20, 2016 4:04pm Apr 20, 2016 4:04pm
   Re: only two hours left of April20 (420): everybody Wayback cannabis homepages EarthFurst 0 Apr 20, 2016 3:31pm Apr 20, 2016 3:31pm
   Re: only two hours left of April20 (420): everybody Wayback cannabis homepages EarthFurst 0 Apr 20, 2016 5:04pm Apr 20, 2016 5:04pm
"archived" pages disappearing from Wayback: reference at archive.is EarthFurst 1 Apr 20, 2016 12:21pm Apr 20, 2016 12:21pm
   Re: 'archived' pages disappearing from Wayback: reference at archive.is Jeff Kaplan 1 Apr 20, 2016 4:08pm Apr 20, 2016 4:08pm
     Re: 'archived' pages disappearing from Wayback: reference at archive.is EarthFurst 1 Apr 22, 2016 2:37am Apr 22, 2016 2:37am
       Re: 'archived' pages disappearing from Wayback: reference at archive.is Jeff Kaplan 0 Apr 22, 2016 10:03am Apr 22, 2016 10:03am
The Wayback Machine Forum is "(closed)", but nothing will stop me from adding this post– BELIEVE IT! pegzmasta 1 Apr 6, 2016 6:27pm Apr 6, 2016 6:27pm
   Re: Original Archive is '(closed)' PDpolice 1 Apr 6, 2016 6:14pm Apr 6, 2016 6:14pm
     Re: Original Archive is '(closed)' pegzmasta 0 Apr 7, 2016 2:37pm Apr 7, 2016 2:37pm
Multiple Set-Cookie Headers: Wayback River_Delta_CA_USA 0 Apr 4, 2016 10:17am Apr 4, 2016 10:17am
Hi, Wayback– Problem Solved! pegzmasta 1 Apr 3, 2016 11:13am Apr 3, 2016 11:13am
   This Is Only a Test Dupenhagen Moonbat 1 Apr 6, 2016 4:46pm Apr 6, 2016 4:46pm
     Re: This Is Only a Test pegzmasta 0 Apr 6, 2016 5:27pm Apr 6, 2016 5:27pm
how to query for all the websites that end in ".com.br"? LucasMation 1 Mar 31, 2016 6:20am Mar 31, 2016 6:20am
   Re: how to query for all the websites that end in '.com.br'? pegzmasta 1 Apr 1, 2016 10:13am Apr 1, 2016 10:13am
     Re: how to query for all the websites that end in '.com.br'? LucasMation 1 Apr 1, 2016 12:03pm Apr 1, 2016 12:03pm
       Re: how to query for all the websites that end in '.com.br'? pegzmasta 0 Apr 1, 2016 12:19pm Apr 1, 2016 12:19pm
Challenge: Read, Reply, and Correct! [The Internet Archive is tasked with preserving content on the Internet, but will it preserve and fix it's own forums?] pegzmasta 0 Mar 16, 2016 2:35pm Mar 16, 2016 2:35pm
How long does it take to get a response from info@archive.org? juwhyonee 1 Feb 26, 2016 10:26am Feb 26, 2016 10:26am
   Re: How long does it take to get a response from info@archive.org? aanon 0 May 3, 2016 5:45am May 3, 2016 5:45am
problem with waybacks of comicbookresources.com homepage after 2013 EarthFurst 0 Feb 18, 2016 1:47am Feb 18, 2016 1:47am
my website is not archiving jon617 0 Jan 7, 2016 4:11pm Jan 7, 2016 4:11pm
So does excluding via robots actually delete or not? talkingnewspapers 0 Jan 7, 2016 9:46am Jan 7, 2016 9:46am
Crawl and archive a whole website recursively maltris 0 Jan 7, 2016 2:26am Jan 7, 2016 2:26am
My Website Is Not Crawled Despite Removing Restrictions From Robots.txt leodwight 0 Jan 4, 2016 7:56pm Jan 4, 2016 7:56pm
What is the algorithm for deciding when to not crawl a page anymore? zwol 0 Dec 4, 2015 9:37am Dec 4, 2015 9:37am
End of an era: Imageshack deletes free accounts Javik 0 Nov 28, 2015 12:55pm Nov 28, 2015 12:55pm
Wayback machine rebuild suggestions Archive Lover1 1 Oct 23, 2015 8:44am Oct 23, 2015 8:44am
   Re: Wayback machine rebuild suggestions h891322 0 Dec 12, 2015 5:55am Dec 12, 2015 5:55am
Entire website archival tycio 0 Oct 22, 2015 10:57pm Oct 22, 2015 10:57pm
Late 2007 Archive... Gone? PeabodySam 0 Oct 9, 2015 5:29pm Oct 9, 2015 5:29pm
How do I retrieve the original form of a page from the Wayback Machine? zwol 1 Sep 1, 2015 2:17pm Sep 1, 2015 2:17pm
   Re: How do I retrieve the original form of a page from the Wayback Machine? DKL3 2 Sep 1, 2015 2:45pm Sep 1, 2015 2:45pm
     Re: How do I retrieve the original form of a page from the Wayback Machine? zwol 0 Sep 3, 2015 11:47am Sep 3, 2015 11:47am
     Re: How do I retrieve the original form of a page from the Wayback Machine? slowride13 0 Sep 29, 2015 9:25am Sep 29, 2015 9:25am
Cannot see content on website but could see before ? Izzy15 1 Aug 31, 2015 6:51am Aug 31, 2015 6:51am
   Re: Cannot see content on website but could see before ? slowride13 1 Sep 29, 2015 9:36am Sep 29, 2015 9:36am
     Re: Cannot see content on website but could see before ? Izzy15 0 Sep 29, 2015 2:09pm Sep 29, 2015 2:09pm
Cannot see content on website but could see before ? Izzy15 0 Aug 31, 2015 6:51am Aug 31, 2015 6:51am
searching url substring iaw4 0 Aug 27, 2015 8:54am Aug 27, 2015 8:54am

View more forum posts
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%