HazyResearch / domino

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Domino

Discover slices of data on which your models underperform.

Getting Started | What is domino? | Docs | Contributing | Paper | About

⚡️ Quickstart

pip install "domino[clip,text] @ git+https://github.com/HazyResearch/domino@main"

For more detailed installation instructions, see the docs.

import domino

To learn more follow along in our tutorial on Google Colab or dive into the docs.

🍕 What is Domino?

Machine learning models that achieve high overall accuracy often make systematic errors on coherent slices of validation data. Domino provides tools to help discover these slices.

What is a slice? A slice is a set of data samples that share a common characteristic. As an example, in large image datasets, photos of vintage cars comprise a slice (i.e. all images in the slice share a common subject). The term slice has a number of synonyms that you might be more familiar with (e.g. subgroup, subpopulation, stratum).

Slice discovery is the task of mining unstructured input data (e.g. images, videos, audio) for semantically meaningful subgroups on which a model performs poorly. We refer to automated techniques that mine input data for semantically meaningful slices as slice discovery methods (SDM). Given a labeled validation dataset and a trained classifier, an SDM computes a set of slicing functions that partition the dataset into slices. This process is illustrated below.

This repository is named domino in reference to the pizza chain of the same name, known for its reliable slice deliveries. It is a slice discovery hub that provides implementations of popular slice discovery methods under a common API. It also provides tools for running quantative evaluations of slice discovery methods.

To see a full list of implemented methods, see the docs.

🔗 Useful Links

Useful References:

Blogposts:

✉️ About

Reach out to Sabri Eyuboglu (eyuboglu [at] stanford [dot] edu) if you would like to get involved or contribute!

About

License:Apache License 2.0


Languages

Language:Python 97.4%Language:Jupyter Notebook 2.3%Language:Makefile 0.3%