COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200825001400/https://github.com/topics/data-sketches
Here are
10 public repositories
matching this topic...
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Updated
Apr 20, 2020
Python
A Clojure library for querying large data-sets on similarity
Updated
Feb 17, 2019
Clojure
Paper about the estimation of cardinalities from HyperLogLog sketches
Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Updated
Jul 24, 2020
Scala
DynaHist: A Dynamic Histogram Library for Java
Updated
Aug 24, 2020
Java
A Prototype For Fitting Monotonic Cubic Splines to a Tdigest Sketch
Updated
Jan 21, 2019
Jupyter Notebook
Yet Another Lame Algorithm Library
Updated
Jan 18, 2020
Python
Implementation for - Mitigating DNS random subdomain DDoS attacks by distinct heavy hitters sketches
Updated
Oct 9, 2019
Jupyter Notebook
A barebones implementation of the simhash data sketching algorithm.
Improve this page
Add a description, image, and links to the
data-sketches
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
data-sketches
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.