There are 53 repositories under data-quality topic.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
A curated, but incomplete, list of data-centric AI resources.
Automatically find issues in image datasets and practice data-centric computer vision.
Data quality assessment and metadata reporting for data frames and database tables
Compilation of high-profile real-world examples of failed machine learning projects
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
📙 Awesome Data Catalogs and Observability Platforms.
Scalable data pre processing and curation toolkit for LLMs
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Home of the Open Data Contract Standard (ODCS).
Metrics Observability & Troubleshooting
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)