MLOps for Image Classification

Authors:

Chuansheng Liu, Xindi Wu, Chongchong Li, Mouadh Sadani

Project Description

Overview

This repository is dedicated to implementing MLOps (Machine Learning Operations) for image classification tasks. The project combines the principles of DevOps with Machine Learning to streamline the development, deployment, and maintenance of image classification models.

Features

Image classification with state-of-the-art machine learning models.
Cloud deployment using FastAPI for easy accessibility.
Data drift detection to maintain model accuracy over time.
Comprehensive testing scripts for data and model integrity.
Containerization with Docker for consistent development and deployment environments.

What framework do we use?

We use Pytorch Image Models framework to achieve our project goal. From the framework, we importe and modify the needed model. Besides, the framework provides many tools for data processing, tuning, and training.

What data do we use to run on?

We use ImageNet 1000 (mini). It includes 1000 classes. It is a more compressed version of the ImageNet dataset and contains 38.7k images. The ImageNet dataset is used widely for classification challenges and is useful to develop Computer Vision and Deep Learning algorithms.

What deep learning model do we use?

The model we use is ResNeSt. It is a ResNet variant which stacking several Split-Attention blocks (conposed by featuremap group and split attention operations). It is easy to work with, computational efficient, and universally improves the learned feature representations to boost performance across image classification.

Installation

Copy repository:

git clone https://github.com/kristian-267/MLOps-for-Image-Classification.git

Configure Environment:

cd MLOps-for-Image-Classification
pip install -r requirements
pip install -r requirements_tests

or

make requirements

Download data:

dvc pull

or download data from: https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000

Usage

Train model:

python src/models/train_model.py

or

make train

Inference:

python src/models/predict_model.py

or

make predict

Run unittests with coverage

coverage run --source=./src -m pytest tests/

or

make tests

Create API (needs Signoz)

make api

Project Organization

├── LICENSE
│
├── Makefile           <- Makefile with commands like `make train`.
│
├── README.md          <- The top-level README for developers using this project.
│
├── app                <- A fastapi to do inference.
│
├── conf
│   ├── data           <- Configurations for dataset.
│   └── experiment     <- Configurations for training.
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── model_store        <- Applications for local and cloud deployment.
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
├── tests              <- Unit tests code
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

This project based on the cookiecutter data science project template. #cookiecutterdatascience

Project Checklist

Week 1

Week 2

Week 3

Check how robust your model is towards data drifting
Setup monitoring for the system telemetry of your deployed model
Setup monitoring for the performance of your deployed model
If applicable, play around with distributed data loading
If applicable, play around with distributed model training
Play around with quantization, compilation and pruning for you trained models to increase inference speed

License

The project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Credits to Chuansheng Liu, Xindi Wu, Chongchong Li, Mouadh Sadani. The external resources Pytorch Image Models, ImageNet 1000 (mini), ResNeSt, and Signoz are used in the project.

About

Project repository for DTU 02476 - MLOps courses in January 2023

MIT License

Languages

Language:Python 78.5%Language:Makefile 16.6%Language:Dockerfile 3.5%Language:Shell 1.3%