The Wayback Machine - https://web.archive.org/web/20220305034957/https://github.com/topics/webarchiving
Here are
42 public repositories
matching this topic...
An Awesome List for getting started with web archiving
Wayback Machine API interface & a command-line tool
-
Updated
Mar 4, 2022
-
Python
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
-
Updated
May 19, 2020
-
JavaScript
Parse And Create Web ARChive (WARC) files with node.js
-
Updated
Sep 1, 2021
-
JavaScript
A list of things related to software, literature, and other content for 🕣 Memento
A dockerized, queued high fidelity web archiver based on Squidwarc
-
Updated
Jul 19, 2020
-
Python
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Decentralized web archiving
-
Updated
Aug 7, 2018
-
Python
A social media open post web archiving tool
-
Updated
Feb 10, 2022
-
JavaScript
Seeder - Czech webarchive curating tool and public site
-
Updated
Feb 10, 2022
-
Python
Various Jupyter notebooks about Common Crawl data
-
Updated
Dec 7, 2021
-
Jupyter Notebook
Quick Cache and Archive search buttons
-
Updated
Dec 26, 2021
-
JavaScript
Digital Preservation of HTTP in documentary heritage.
pywb recorder over tor, anonymously records the web. (docker image)
Tika based link extractor for httpreserve
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
-
Updated
Mar 1, 2022
-
Python
record current active tab on webrecorder.io
-
Updated
May 9, 2017
-
JavaScript
An archival thumbnail visualization server
-
Updated
Sep 2, 2020
-
JavaScript
News Archiver, Data Aggregation for CNN and Fox News
-
Updated
Jan 25, 2022
-
JavaScript
Awesome list dedicated to digital and data preservation tools, sources, services and so on.
A helper package to tokenize textual content and retrieve hyperlinks
Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB
-
Updated
Dec 22, 2021
-
JavaScript
Given four bytes, download a random file from web archives implementing the UKWA Shine interface
Class page for ODU CS 791 / 891 Web Archiving Seminar
A wrapper for phantom.js commands for headless screenshots.
From WARC records to MongoDB documents
A archiving utility with an interface for web servers.
-
Updated
Aug 3, 2021
-
Python
Link crawler for a phpBB forum
-
Updated
Jul 17, 2017
-
Java
-
Updated
Sep 20, 2017
-
JavaScript
A set of web archival replay test cases
-
Updated
Oct 25, 2021
-
HTML
Improve this page
Add a description, image, and links to the
webarchiving
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
webarchiving
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.