altaris / acc23

Allergy Chip Challenge 2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python 3 License Code style

Minimal example

from acc23 import models, ACCDataModule
from acc23.utils import train_model

# 1. Choose a model
model = models.Orchid()

# 2. Construct a datamodule from the competition data
datamodule = ACCDataModule("data/train.csv", "data/test.csv", "data/images")

# 3. Train the model. This can of course be done directly with pytorch
# lightning's API, or even a classic pytorch training loop
model = train_model(model, datamodule, root_dir="out")

# 4. Evaluate the model on the test dataset. The output file can readily
# be submitted to trustii.io!
df = eval_on_test_dataset(model, datamodule, root_dir="out/eval/test")
df.to_csv(f"out/predictions.csv", index=False)

Package organization

The important user-facing modules are:

  • acc23.models: Subpackage that contains all model definitions
  • acc23.dataset: Submodule that defines acc23.dataset.ACCDataModule, which is a pytorch lightning datamodule that takes care of importing and preprocessing the challenge data.
  • acc23.postprocessing: Contains everything pertaining to postprocessing, i.e. going from raw model outputs to clean prediction CSV files. Important methods are acc23.postprocessing.eval_on_test_dataset and acc23.postprocessing.eval_on_train_dataset.
  • acc23.explain: Everything related to explainability of the model's predictions. Important members are acc23.explain.VitExplainer which produces attention maps for vision transformers, and acc23.explain.shap, which approximates SHAP values in a model-agnostic way.
  • acc23.utils: acc23.utils.train_model and other miscellaneous stuff.

The following modules also exist but the user shouldn't need to use them directly

  • acc23.preprocessing: Contains everything pertaining to preprocessing. Used by acc23.dataset.ACCDataModule.
  • acc23.constants: Constants about the dataset, e.g. the number of features or the name of the target columns.
  • acc23.mlsmote: Implementation of the MLSMOTE dataset augmentation algorithm. Part of the preprocessing pipeline.

Submitting via acc23 CLI:

. ./secret.env && python3 -m acc23 submit -t "$TOKEN" out.csv dummy.ipynb

Troubleshooting

2023-04-26 Image corruption

data/images/CY60527_4_190006236104_2022_12_22_12_15_22.bmp is corrupted? PIL raises an OSError when loading... data/images/CY60527_4_190006236104_2022_12_22_12_11_20.bmp appears similar, so i just

cp data/images/CY60527_4_190006236104_2022_12_22_12_15_22.bmp data/images/CY60527_4_190006236104_2022_12_22_12_11_20.bmp

and called it a day

Contributing

Dependencies

  • python3.10 or newer;
  • requirements.txt for runtime dependencies;
  • requirements.dev.txt for development dependencies.
  • make (optional);

Simply run

virtualenv venv -p python3.10
. ./venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements.dev.txt

Documentation

Simply run

make docs

This will generate the HTML doc of the project, and the index file should be at docs/index.html. To have it directly in your browser, run

make docs-browser

Code quality

Don't forget to run

make

to format the code following black, typecheck it using mypy, and check it against coding standards using pylint.

About

Allergy Chip Challenge 2023

License:MIT License


Languages

Language:Python 67.7%Language:Jupyter Notebook 32.1%Language:Makefile 0.2%