telmo-correa / ml-dataval-tutorial

Modified version of W&B ml data validation repository, including code for partial sample corruption

Home Page:https://api.wandb.ai/links/telmo-correa/8g9j58of

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository is a modified version of https://github.com/wandb/edu/tree/main/ml-dataval-course , containing extra code to inspect partial sample corruption.

Original README below.

Data Validation Techniques for Machine Learning

This repository contains the code used in the course Data Validation for Machine Learning. The course focuses on gaining expertise in data validation to build robust ML pipelines, detect data drift, and manage data quality using tools like TensorFlow Data Validation and GATE.

Installation

You must be using Python 3.8 or later, and you must have wget. You can install wget with brew.

You can install the required packages for this module with:

pip install -r requirements.txt

Then, download the data with:

bash download.sh

The data may take a while to download, since it is 7.0 GB.

Course Description

In this comprehensive data validation course, you will:

  • Grasp the importance of data validation

    • Discover how data validation enhances machine learning pipelines by managing data drift, schema validation, and handling data corruption.
  • Dive into hands-on examples

    • Analyze real-world datasets with techniques such as schema validation, drift detection, and continual retraining.
  • Utilize powerful tools

    • Learn to use TensorFlow Data Validation (TFDV) and the GATE method to effectively detect data drifts and maintain data quality.

Prerequisites

To get the most out of this course, you should have:

  • Basic knowledge of machine learning.
  • Familiarity with Python programming.

Start Learning

Enroll now to master data validation, get ahead in your Machine Learning career, and earn your certificate.

About

Modified version of W&B ml data validation repository, including code for partial sample corruption

https://api.wandb.ai/links/telmo-correa/8g9j58of

License:MIT License


Languages

Language:Jupyter Notebook 79.6%Language:Python 20.2%Language:Shell 0.2%