The Wayback Machine - https://web.archive.org/web/20230321131707/https://github.com/topics/data-cleaning

#

data-cleaning

Here are 1,783 public repositories matching this topic...

miller

johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated Mar 18, 2023
Go

cleanlab / cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

data-science machine-learning data-validation exploratory-data-analysis annotations weak-supervision classification outlier-detection crowdsourcing data-cleaning active-learning data-quality image-tagging entity-recognition robust-machine-learning noisy-labels out-of-distribution-detection data-labeling label-errors data-centric-ai

Updated Mar 21, 2023
Python

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

testing schema validation data-validation pandas-dataframe assertions pandas testing-tools data-processing dataframes data-cleaning hypothesis-testing data-verification pandas-validation data-check data-assertions dataframe-schema pandas-validator

Updated Mar 20, 2023
Python

justmarkham / pandas-videos

Jupyter notebook and datasets from the pandas Q&A video series

python data-science tutorial jupyter-notebook pandas data-analysis data-cleaning

Updated May 16, 2022
Jupyter Notebook

justmarkham / DAT8

General Assembly's 2015 Data Science course in Washington, DC

Updated Oct 6, 2022
Jupyter Notebook

hi-primus / optimus

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

data-science machine-learning spark bigdata data-transformation pyspark data-extraction data-analysis data-wrangling dask data-exploration data-preparation data-cleaning data-profiling data-cleansing big-data-cleaning data-cleaner cudf dask-cudf

Updated Mar 20, 2023
Python

sfirke / janitor

simple tools for data cleaning in R

data-science r excel spss tidyverse pivot-tables data-analysis data-cleaning dirty-data tabulations

Updated Mar 14, 2023
R

data-forge / data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

visualization nodejs javascript linq json data csv pandas data-visualization data-analysis data-wrangling data-management data-manipulation data-cleaning data-munging data-cleansing data-forge

Updated Mar 9, 2023
TypeScript

dirty-cat / dirty_cat

Machine learning on dirty tabular data

data-science data machine-learning data-analysis data-preprocessing data-preparation data-cleaning dirty-data

Updated Mar 21, 2023
Python

schema-inspector / schema-inspector

Schema-Inspector is a simple JavaScript object sanitization and validation module.

javascript sanitization validation data-cleaning

Updated Dec 22, 2022
JavaScript

objectiv-analytics

objectiv / objectiv-analytics

Powerful product analytics for data teams, with full control over data & models.

Updated Jan 13, 2023
Python

data-cleaning / validate

Professional data validation for the R environment

r validation data-cleaning

Updated Mar 13, 2023
R

msamogh / nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

machine-learning torch pytorch data-preprocessing preprocessing data-processing data-cleaning data-pipeline

Updated Sep 22, 2022
Python

klib

akanz1 / klib

Sponsor

Easy to use Python library of customized functions for cleaning and analyzing data.

python data-science data-visualization feature-selection data-analysis klib data-preprocessing data-cleaning

Updated Jan 14, 2023
Python

jim-schwoebel / voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

visualization security data machine-learning server voice python3 voice-recognition generation transcription voice-control data-cleaning voice-assistant encryption-decryption voice-recording voice-activity-detection wake-word-detection featurization voice-computing

Updated Dec 8, 2022
Python

rasgointelligence / feature-engineering-tutorials

Data Science Feature Engineering and Selection Tutorials

python data-science machine-learning tutorial jupyter notebook scikit-learn exploratory-data-analysis tutorials pandas feature-selection xgboost feature-engineering features data-cleaning pandas-profiling sweetviz pyrasgo

Updated Dec 29, 2022
Jupyter Notebook

probcomp / PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

probabilistic-programming bayesian-inference data-cleaning probabilistic-graphical-models data-cleansing

Updated May 25, 2022
Julia

ajaymache / data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

data-science exploratory-data-analysis eda data-visualization kaggle-competition data-analytics data-analysis data-wrangling data-cleaning kaggle-dataset data-cleansing data-science-python data-analysis-python kaggle-used-cars-dataset

Updated Jan 2, 2019
Jupyter Notebook

ekstroem / dataMaid

An R package for data screening

reproducible-research data-cleaning data-screening

Updated Jan 25, 2022
HTML

encord-team / encord-active

The open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

python data-science data machine-learning computer-vision deep-learning data-validation annotations ml object-detection data-cleaning active-learning data-quality data-centric mlops noisy-labels model-quality label-errors label-quality

Updated Mar 21, 2023
Python

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."