COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20220828063351/https://github.com/topics/dedupe
Here are
69 public repositories
matching this topic...
Fast, secure, efficient backup program
Deduplicating archiver with compression and authenticated encryption.
Updated
Aug 28, 2022
Python
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Updated
Aug 17, 2022
Python
Deduplication tool for yarn.lock files
Updated
Aug 27, 2022
TypeScript
A powerful duplicate file finder and an enhanced fork of 'fdupes'.
A powerful and modular toolkit for record linkage and duplicate detection in Python
Updated
Apr 19, 2022
Python
Make CSS easier and more maintainable by using JavaScript
Updated
Nov 9, 2021
TypeScript
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Updated
Aug 25, 2022
Java
🆔 Command line tool for deduplicating CSV files
Updated
Mar 31, 2020
Python
Best-Effort Extent-Same, a btrfs dedupe agent
🆔 Examples for using the dedupe library
Updated
Jan 19, 2022
Python
Finding and deleting near-duplicate images based on perceptual hash.
Updated
Jun 21, 2022
Python
Updated
Jan 25, 2022
Rust
📧 CLI to deduplicate mails from mail boxes.
Updated
Aug 19, 2022
Python
Fast block-level out-of-band BTRFS deduplication tool.
Updated
Jun 5, 2021
Python
A simple command line interface to the datamade/dedupe library.
Updated
Jun 21, 2022
Jupyter Notebook
Self-contained C# library for data deduplication using Sqlite
Updated
Jul 13, 2022
Rust
Utilities for de-duping Django model instances
Updated
Jul 30, 2021
Python
Improve this page
Add a description, image, and links to the
dedupe
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
dedupe
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.