Piyushi-0 / rethinking-minimax-fairness

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rethinking minimax-fairness

Code for our ICML 2023 paper 'When do ERM and Minimax-fair Learning Coincide?'


Installation

  1. Install Anaconda environment

  2. Install Python packages

pip install folktables xgboost netcal torch seaborn
  1. Install package for multilayer perceptron models
pip install rtdl libzero
  1. Install autogluon. Instructions to install are at autogluon website

  2. Install folktables package from source code in the folder named folktables

cd folktables
pip install -r requirements.txt

This code differs from the one in the GitHub repo in the df_to_pandas function in folktables/folktables.py We drop the first level of categorical variables after making the dummies

  1. (Optional) Install package to pretty print tables in Jupyter
pip install dataframe-image

Datasets

  1. Download a part of the datasets following the instructions in this GitHub repo for the Minimax Group Fairness paper

  2. Save all of these datasets in a folder named Datasets

  3. Rest of the datasets except eICU are downloaded automatically by the scripts

  4. Scripts to extract and download eICU dataset are at this GitHub repo. Accessing the data requires completing an online training course and requesting access through the PhysioNet website. Details of getting access are at this website


Run instructions

Run run.sh with the id of the dataset in the list Dataset().list_datasets() in main.py

For example, run the following script for the first dataset id

bash run.sh 1

The script runs all models for the given dataset once including and once excluding the group feature


Credits (also see THIRD-PARTY-LICENSES)

This repo contains code modified from the following GitHub repos

  1. folktables, MIT license

Code included in folder folktables and the file folktables_helper.py

  1. minimax-fair, Apache 2.0 license

Code included in folder src and the files dataset_mapping.py, prepare_datasets.py

  1. active-sampling-for-minmax-fairness, Apache 2.0 license

Code included in folder algorithms and the file prepare_datasets.py


Licenses

Software License
folktables MIT License
minimax-fair Apache License 2.0
active-sampling-for-minmax-fairness Apache License 2.0
xgboost Apache License 2.0
autogluon Apache License 2.0
netcal Apache License 2.0
torch Modified BSD 3-Clause
seaborn BSD 3-Clause
pandas BSD 3-Clause
numpy BSD 3-Clause
matplotlib PSF License
scikit-learn BSD 3-Clause
rtdl MIT License
libzero MIT License
dataframe-image MIT License

Troubleshooting known errors

libomp.dylib related error if running on Macbook M1

OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/

Try re-running the code after running command on terminal as suggested above

export KMP_DUPLICATE_LIB_OK=TRUE

About

License:Apache License 2.0


Languages

Language:Python 86.5%Language:Jupyter Notebook 13.2%Language:Shell 0.3%