Made With ML

Applied ML · MLOps · Production
Join 30K+ developers in learning how to responsibly deliver value with ML.

🔥 Among the top MLOps repositories on GitHub

MLOps

Learn how to apply ML to build a production grade product to deliver value.

Lessons: https://madewithml.com/#mlops
Code: GokuMohandas/MLOps

If you need refresh yourself on the foundations of machine learning, check out our Made With ML repository.

📦 Purpose	📝 Scripting	♻️ Reproducibility
Product	Packaging	Git
System design	Organization	Pre-commit
Project	Logging	Versioning
🔢 Data	Styling	Docker
Labeling	Makefile	🚀 Production
Preprocessing	Documentation	Dashboard
Exploratory data analysis	📦 Interfaces	CI/CD workflows
Splitting	Command-line	Infrastructure
Augmentation	RESTful API	Monitoring
📈 Modeling	✅ Testing	Feature store
Evaluation	Code	Pipelines
Experiment tracking	Data	Continual learning
Optimization	Models

📆 New lessons every month!
Subscribe for our monthly updates on new content.

Directory structure

app/
├── api.py           - FastAPI app
├── gunicorn.py      - WSGI script
└── schemas.py       - API model schemas
config/
├── config.py        - configuration setup
├── params.json      - training parameters
└──  test_params.py  - training test parameters
tagifai/
├── data.py          - data processing components
├── eval.py          - evaluation components
├── main.py          - training/optimization pipelines
├── models.py        - model architectures
├── predict.py       - inference components
├── train.py         - training components
└── utils.py         - supplementary utilities

Documentation for this application can be found here.

Workflows

Set up environment.

make venv
source venv/bin/activate

Get data

# Download to data/
tagifai download-auxiliary-data

# or Pull from DVC
dvc init
dvc remote add -d storage stores/blob
dvc pull

Compute features

tagifai compute-features

Optimize using distributions specified in tagifai.main.objective. This also writes the best model's params to config/params.json

tagifai optimize \
    --params-fp config/params.json \
    --study-name optimization \
    --num-trials 100

You can use your own on-prem GPUs, infrastructure from cloud providers (AWS, GCP, Azure, etc.) or check out the optimize.ipynb notebook for how to train on Google Colab and transfer trained artifacts to your local machine.

Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside experiments/{experiment_id}/{run_id} or via the API (GET /runs/{run_id}).

tagifai train-model \
    --params-fp config/params.json \
    --model-dir model \
    --experiment-name best \
    --run-name model

Predict tags for an input sentence. It'll use the best model saved from train-model but you can also specify a run-id to choose a specific model.

Command-line app

tagifai predict-tags --text "Transfer learning with BERT"

FastAPI

uvicorn app.api:app \
    --host 0.0.0.0 \
    --port 5000 \
    --reload \
    --reload-dir tagifai \
    --reload-dir app

View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.

tagifai diff

Push versioned assets

# Push
dvc add data/projects.json
dvc add data/tags.json
dvc add data/features.json
dvc add data/features.parquet
dvc push

Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.

git add .
git commit -m ""
git tag -a <TAG_NAME> -m ""
git push origin <BRANCH_NAME>

Commands

Environments

python -m pip install -e . --no-cache-dir  # prod
python -m pip install -e ".[test]" --no-cache-dir  # test
python -m pip install -e ".[docs]" --no-cache-dir  # docs
python -m pip install -e ".[dev]" --no-cache-dir  # dev
pre-commit install
pre-commit autoupdate

Docker

docker build -t tagifai:latest -f Dockerfile .
docker run -p 5000:5000 --name tagifai tagifai:latest

Application

uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app  # dev
gunicorn -c app/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app  # prod

Streamlit

streamlit run streamlit/app.py

MLFlow

mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/

Airflow

Set up

AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db init  # `airflow db reset` to reset everything
airflow users create \
    --username admin \
    --firstname Goku \
    --lastname Mohandas \
    --role Admin \
    --email goku@madewithml.com

Run webserver

export AIRFLOW_HOME=${PWD}/airflow
airflow webserver --port 8080

Run scheduler (in separate terminal)

export AIRFLOW_HOME=${PWD}/airflow
airflow scheduler

Feature store

feast init --minimal --template local features
touch features/features.py
cd features
feast apply
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Mkdocs

python -m mkdocs serve

Testing

Great expectation checkpoints (read more here)

great_expectations checkpoint run projects
great_expectations checkpoint run tags

Full coverage testing

pytest tests --cov tagifai --cov app  # report in STDOUT
pytest tests --cov tagifai --cov app --cov-report html  # report in htmlcov/

Testing only the non-training components
```
pytest -m "not training"
```

Jupyterlab

python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab

You can also run all notebooks on Google Colab.

FAQ

Who is this content for?

Software engineers looking to learn ML and become even better software engineers.
Data scientists who want to learn how to responsibly deliver value with ML.
College graduates looking to learn the practical skills they'll need for the industry.
Product Managers who want to develop a technical foundation for ML applications.

What is the structure?

Lessons will be released weekly and each one will include:

intuition: high level overview of the concepts that will be covered and how it all fits together.
code: simple code examples to illustrate the concept.
application: applying the concept to our specific task.
extensions: brief look at other tools and techniques that will be useful for difference situations.

What makes this content unique?

hands-on: If you search production ML or MLOps online, you'll find great blog posts and tweets. But in order to really understand these concepts, you need to implement them. Unfortunately, you don’t see a lot of the inner workings of running production ML because of scale, proprietary content & expensive tools. However, Made With ML is free, open and live which makes it a perfect learning opportunity for the community.
intuition-first: We will never jump straight to code. In every lesson, we will develop intuition for the concepts and think about it from a product perspective.
software engineering: This course isn't just about ML. In fact, it's mostly about clean software engineering! We'll cover important concepts like versioning, testing, logging, etc. that really makes something production-grade product.
focused yet holistic: For every concept, we'll not only cover what's most important for our specific task (this is the case study aspect) but we'll also cover related methods (this is the guide aspect) which may prove to be useful in other situations.

Who is the author?

I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned.
Connect with me on Twitter and LinkedIn

Why is this free?

While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I believe that creativity and intelligence are randomly distributed while opportunities are siloed. I want to enable more people to create and contribute to innovation.

To cite this course, please use:

@misc{madewithml,
    author       = {Goku Mohandas},
    title        = {Made With ML MLOps Course},
    howpublished = {\url{https://madewithml.com/}},
    year         = {2021}
}

KhileshChauhan / MLOps