This repository contains the code of the webinar The Anatomy of a Machine Learning Pipeline, presented both at ENBIS and at the University of Milan-Bicocca, based on the book The Pragmatic Programmer for Machine Learning published by Taylor & Francis.
- 16th September 2024 - UniversitĂ degli Studi di Milano-Bicocca
- 10th January 2024 - European Network for Business and Industry Statistics
The Anatomy of a Machine Learning Pipeline
To utilize this repository effectively, it's essential to install specific
software dependencies on your computer using your Linux distribution's package
manager, brew
for MacOS, or Chocolately
for Windows. These dependencies are
crucial for the optimal operation of the codebase and reproducibility.
- Docker 26+ - Docker documentation.
- Docker Compose 2.27+ - Docker Compose documentation.
- Python 3.10+ - Python documentation.
- Poetry 1.8+ - Poetry documentation.
- GNU Make 3.81+ - GNU Make documentation.
To verify that you have the correct version of the software installed, run the following commands (for MacOS and GNU/Linux users):
make check-deps
We suggest using asdf
to manage the Python version, you can install it
following the instructions at asdf.
asdf install plugin python
asdf install python 3.10.15
asdf plugin-add direnv
asdf direnv setup --shell bash --version latest
python --version
pip install poetry
poetry config virtualenvs.in-project true
poetry config virtualenvs.path .venv
git clone https://github.com/pragprogml/enbis-2024
cd enbis-2024
poetry install --no-root
poetry env info
cp .envrc.example .envrc
vim .envrc
cat .envrc
layout python
ROOT_DIR="/home/user/development/enbis-2024"
PARAMS="params-dev.yaml"
VIRTUAL_ENV=.venv
PATH=$VIRTUAL_ENV/bin:$PATH
dvc stage list
train Outputs models/best_model.pt
evaluate Outputs reports/yolo_metrics.png, reports/predicted_images.png
dvc dag
+-------+
| train |
+-------+
*
*
*
+----------+
| evaluate |
+----------+
dvc repro train
dvc repro evaluate
dvc repro --downstream evaluate # without previous stage finalization
The output and artifacts of each training run are stored under
runs/
, while a straightforward evaluation output can be found in
the reports/
directory.
- 00_preparations.ipynb
- 01_visualize.ipynb
- 01_visualize_iou.ipynb
- 02_training.ipynb
- 03_evaluate.ipynb
- 03_evaluate_onnx.ipynb
- 03_inference_api.ipynb
Update the YOLO setting in training.py:training_config
to enable W&B or MLFlow, or use
yolo settings mlflow={True|False} wandb={True|False}
make run-demo
make run-api
make docker-build
make docker-run
make lint
The dataset we presented during the seminar is proprietary and unfortunately cannot be shared or used outside of our organization. You might consider using a dataset that contains a single class, such as the one available at Synthetic Corrosion Dataset Computer Vision Project. Please let us know if you have any questions or need further assistance in finding suitable datasets for your work.
Reach out to us via https://github.com/pragprogml.
- Marco Scutari, Ph.D. - Senior Researcher in Bayesian Networks and Graphical Models - Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)
- Mauro Malvestio - Founder & CTO @ DSCOVR
When citing PPML in academic papers and theses, please use this BibTeX entry:
@BOOK{ppml,
author = {M. Scutari and M. Malvestio},
title = {{The Pragmatic Programmer for Machine Learning: Engineering
Analytics and Data Science Solutions}},
publisher = {Chapman \& Hall},
year = {2023}
}