COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200815060802/https://github.com/topics/data-matching
Here are
13 public repositories
matching this topic...
A toolkit for record linkage and duplicate detection in Python
Updated
Jun 4, 2020
Python
A list of free data matching and record linkage software.
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
Updated
Mar 23, 2020
Python
Link Wikidata items to large catalogs
Updated
Apr 1, 2020
Python
Resources for tackling record linkage / deduplication / data matching problems
Implementation in Apache Spark of the EM algorithm to estimate parameters of Fellegi-Sunter's canonical model of record linkage.
Updated
Aug 12, 2020
Python
A browser user interface for manual labeling of record pairs.
Updated
Dec 26, 2019
JavaScript
A maximum-strength name parser for record linkage.
Updated
Nov 6, 2019
Python
Undergraduate Final Project (needs README up to date!!) - Scientific paper soon to be included
Weka Comparator to match rules to test data with filtering abilites
Updated
Feb 17, 2019
Java
Repository for CS 838 (Spring 2017) Data Science project
Updated
Apr 1, 2017
Jupyter Notebook
Novel Ensample Learning Approach to Unsupervised Record Linkage
Updated
Feb 2, 2019
Jupyter Notebook
Improve this page
Add a description, image, and links to the
data-matching
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
data-matching
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.