The Wayback Machine - https://web.archive.org/web/20220321063401/https://github.com/topics/archivespark
Here are
3 public repositories
matching this topic...
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Updated
Oct 8, 2021
Scala
Convert web archives to RDF triples with ArchiveSpark
Updated
Mar 28, 2017
Jupyter Notebook
ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)
Updated
Dec 12, 2017
Scala
Improve this page
Add a description, image, and links to the
archivespark
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
archivespark
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.