Always know what to expect from your data.
-
Updated
Jun 14, 2023 - Python
Always know what to expect from your data.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Compare tables within or across databases
re_data - fix data issues before your users & CEO would discover them
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
ML powered analytics engine for outlier detection and root cause analysis.
The premier open source Data Quality solution
Library for Semi-Automated Data Science
Frontend for the osmcha-django REST API
Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.
A data quality acceleration library to get data sets verified in a friendly interface
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
Tutorial and examples of Data Quality in Big Data System
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, la…
Open source clients for working with Data Culpa Validator services from data pipelines
Quality Aware Feature Store
Add a description, image, and links to the dataquality topic page so that developers can more easily learn about it.
To associate your repository with the dataquality topic, visit your repo's landing page and select "manage topics."