COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20211022005556/https://github.com/topics/massive-datasets
Here are
14 public repositories
matching this topic...
PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.
Updated
Oct 21, 2021
Java
PolarDB-X is a cloud native distributed SQL Database designed for high concurrency, massive storage, complex querying scenarios.
Command line tool to quickly generate a lot of files in a lot of directories
Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application
Updated
Feb 24, 2017
Java
Building node2vec algorithm
Updated
Oct 7, 2021
Jupyter Notebook
gipa -- compression/decompression tool to package compress and encode massive archive files with floating-point data
Updated
Sep 14, 2017
Python
Building a Bloom Filter on English dictionary words
Updated
Oct 7, 2021
Jupyter Notebook
Updated
Dec 11, 2020
Jupyter Notebook
Series of SQL exercise working with databases, using Google BigQuery to scale to massive datasets taught by educators in Kaggle.com
Updated
Jul 9, 2019
Jupyter Notebook
Updated
Oct 6, 2021
Jupyter Notebook
Building PageRank algorithm on Web Graph around Stanford.edu using NetworkX python library
Updated
Oct 7, 2021
Jupyter Notebook
Map Reduce program to suggest new friends based on count of mutual friends
Lab assignments for the Analysis of Massive Data Sets course @ FER, University of Zagreb
University lab exercises with processing big data.
Updated
Nov 19, 2018
Python
Improve this page
Add a description, image, and links to the
massive-datasets
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
massive-datasets
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.