There are 50 repositories under data-quality topic.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Always know what to expect from your data.
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
A curated, but incomplete, list of data-centric AI resources.
Automatically find issues in image datasets and practice data-centric computer vision.
Data quality assessment and metadata reporting for data frames and database tables
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
Compilation of high-profile real-world examples of failed machine learning projects
📙 Awesome Data Catalogs and Observability Platforms.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
Metrics Observability & Troubleshooting
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Home of the Open Data Contract Standard (ODCS).
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)
Profile and monitor your ML data pipeline end-to-end