There are 28 repositories under data-quality topic.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Create HTML profiling reports from pandas DataFrame objects
Always know what to expect from your data.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels
Data quality assessment and metadata reporting for data frames and database tables
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
First open-source data discovery and observability platform. ODD Platform is based on ODD Specification.
Profile and monitor your ML data pipeline end-to-end
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads.
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
Great Expectations Airflow operator
📙 Awesome Data Catalogs and Observability Platforms.
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
A tool to help improve data quality standards in observational data science.
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Automated data quality suggestions and analysis with Deequ on AWS Glue
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
re_data - fix data issues before your users & CEO would discover them 😊
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Data validation library for PySpark 3.0.0