Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, Tijana Zrnic
This repository contains code for prediction-powered inference — a framework for constructing confidence intervals when using predictions from a machine learning model. The main algorithms are in ppi.py
.
Each subfolder applies prediction-powered inference to a different inference task in proteomics, genomics, electronic voting, remote sensing, census analysis, or ecology.
Start by running the notebooks!
You can test and develop prediction-powered inference strategies entirely in this sandbox, locally on your laptop. Open a notebook to see the expected output. You can use these notebooks to experiment with existing methods or as templates to develop your own.
alphafold/odds-ratio.ipynb
: Measuring the assotiation between phosphorylation and intrinsically disordered regions using AlphaFold.ballots/ballots.ipynb
: Calling the 2022 San Francisco Special Election between Matt Haney and David Campos using an optical voting system.census/ols.ipynb
: Quantifying the relationship between age, sex, and income using census data.census/logistic.ipynb
: Quantifying the relationship between income and private health insurance using census data.forest/deforestation.ipynb
: Gauging deforestation levels in the Amazon Rainforest from satellite imagery using computer vision.gene-expression/gene-expression-quantiles.ipynb
: Analyzing the effect of promoters on gene expression using a transformer.plankton/plankton.ipyn
: Counting the number of plankton seen by a submersible camera using a resnet.
To run these notebooks locally, you just need to have the correct dependencies installed and press run all cells
! Cloning the GitHub and running the notebooks will automatically download all required data and model outputs. Code for generating the precomputed data from the raw datasets is available in each individual subfolder. There is one for each dataset. To create a conda
environment with the correct dependencies, run conda env create -f environment.yml
. If you still get a dependency error, make sure to activate the ppi
environment within the Jupyter notebook.
This repository is meant to accompany our paper, Prediction-Powered Inference. The paper contains detailed explanations and attributions for each example. If you find this repository useful, in addition to the relevant methods and datasets, please cite:
@article{angelopoulos2023prediction,
title={Prediction-Powered Inference},
author={Angelopoulos, Anastasios N and Bates, Stephen and Fannjiang, Clara and Jordan, Michael I. and Zrnic, Tijana},
journal={arXiv preprint arXiv:2301.09633},
year={2023}
}