xaviergonzalez / beta_shapley

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning (AISTATS 2022 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

This repository provides an implementation of the paper Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning accepted at AISTATS 2022 as oral presentation. We propose a noise-reduced data valuation method, Beta Shapley, which is powerful at capturing the importance of data points.

Quick start

We provide a notebook using the Covertype dataset. It shows how to compute the Beta Shapley value and its application on several downstream ML tasks.

--> Beta Shapley can identify noisy samples by focusing marginal contributions on small cardinalities.

--> Beta Shapley on the CIFAR100 test dataset. Mislabeled data points have negative Beta Shapley values, meaning they actually harm the model performance. Beta Shapley can detect mislabeled points.

Files

betashap/ShapEngine.py: main class for computing Beta-Shapley.

betashap/data.py: handles loading and preprocessing datasets.

About

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning (AISTATS 2022 Oral)


Languages

Language:Jupyter Notebook 57.2%Language:Python 42.8%