ai-engineering artificial-intelligence data-science data-version-control explainable-ai green-ai machine-learning machine-learning-pipeline uncertainty-estimation

d2m - Data to model

A machine learning pipeline for trustworthy and green models, enabling responsible AI:

Explainable AI, using SHAP, LIME or both.
Uncertainty estimation, using Bayesian dropout for neural networks.
Carbon emissions tracking and reporting, using CodeCarbon.

d2m lets you easily create and evaluate machine learning models for tabular and time series data, with built-in data profiling and feature engineering.

Usage

Tested on:

Linux
macOS
Windows with WSL 2

Clone/download this repository.
Place your datafiles (csv) in a folder with the name of your dataset (DATASET) inside assets/data/raw/, so the path to the files is assets/data/raw/[DATASET]/.
Update params.yaml with the name of your dataset (DATASET), the target variable, and other configuration parameters.
Build Docker container:

docker build -t d2m -f Dockerfile .

Run the container:

docker run -p 5000:5000 -it -v $(pwd)/assets:/usr/d2m/assets -v $(pwd)/.dvc:/usr/d2m/.dvc d2m

Open the website at localhost:5000 to use the graphical user interface.

Creating models on the command line

Copy params.yaml from the host to the container (find CONTAINER_NAME by running docker ps):

docker cp params.yaml  [CONTAINER_NAME]:/usr/d2m/params.yaml

Inside the interactive session in the container, run:

docker exec [CONTAINER_NAME] dvc repro

About

A machine learning pipeline taking you from raw data to fully trained machine learning model - from data to model (d2m).

https://ejhusom.github.io/d2m/

ai-engineering artificial-intelligence data-science data-version-control explainable-ai green-ai machine-learning machine-learning-pipeline uncertainty-estimation

MIT License

Languages

Language:Python 93.0%Language:HTML 4.3%Language:CSS 1.8%Language:Shell 0.7%Language:Dockerfile 0.2%