KhileshChauhan / MLOps

A project-based course on the foundations of MLOps with a focus on intuition and application.

Home Page:https://madewithml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Applied ML Β· MLOps Β· Production
Join 30K+ developers in learning how to responsibly deliver value with ML.

     
πŸ”₯  Among the top MLOps repositories on GitHub


MLOps

Learn how to apply ML to build a production grade product to deliver value.

If you need refresh yourself on the foundations of machine learning, check out our Made With ML repository.

πŸ“¦  Purpose πŸ“  Scripting ♻️  Reproducibility
Product Packaging Git
System design Organization Pre-commit
Project Logging Versioning
πŸ”’  Data Styling Docker
Labeling Makefile πŸš€  Production
Preprocessing Documentation Dashboard
Exploratory data analysis πŸ“¦  Interfaces CI/CD workflows
Splitting Command-line Infrastructure
Augmentation RESTful API Monitoring
πŸ“ˆ  Modeling βœ…  Testing Feature store
Evaluation Code Pipelines
Experiment tracking Data Continual learning
Optimization Models

πŸ“†  New lessons every month!
Subscribe for our monthly updates on new content.


Directory structure

app/
β”œβ”€β”€ api.py           - FastAPI app
β”œβ”€β”€ gunicorn.py      - WSGI script
└── schemas.py       - API model schemas
config/
β”œβ”€β”€ config.py        - configuration setup
β”œβ”€β”€ params.json      - training parameters
└──  test_params.py  - training test parameters
tagifai/
β”œβ”€β”€ data.py          - data processing components
β”œβ”€β”€ eval.py          - evaluation components
β”œβ”€β”€ main.py          - training/optimization pipelines
β”œβ”€β”€ models.py        - model architectures
β”œβ”€β”€ predict.py       - inference components
β”œβ”€β”€ train.py         - training components
└── utils.py         - supplementary utilities

Documentation for this application can be found here.

Workflows

  1. Set up environment.
make venv
source venv/bin/activate
  1. Get data
# Download to data/
tagifai download-auxiliary-data

# or Pull from DVC
dvc init
dvc remote add -d storage stores/blob
dvc pull
  1. Compute features
tagifai compute-features
  1. Optimize using distributions specified in tagifai.main.objective. This also writes the best model's params to config/params.json
tagifai optimize \
    --params-fp config/params.json \
    --study-name optimization \
    --num-trials 100

You can use your own on-prem GPUs, infrastructure from cloud providers (AWS, GCP, Azure, etc.) or check out the optimize.ipynb notebook for how to train on Google Colab and transfer trained artifacts to your local machine.

  1. Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside experiments/{experiment_id}/{run_id} or via the API (GET /runs/{run_id}).
tagifai train-model \
    --params-fp config/params.json \
    --model-dir model \
    --experiment-name best \
    --run-name model
  1. Predict tags for an input sentence. It'll use the best model saved from train-model but you can also specify a run-id to choose a specific model.

    • Command-line app

      tagifai predict-tags --text "Transfer learning with BERT"
    • FastAPI

      uvicorn app.api:app \
          --host 0.0.0.0 \
          --port 5000 \
          --reload \
          --reload-dir tagifai \
          --reload-dir app
  2. View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.

tagifai diff
  1. Push versioned assets
# Push
dvc add data/projects.json
dvc add data/tags.json
dvc add data/features.json
dvc add data/features.parquet
dvc push
  1. Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.
git add .
git commit -m ""
git tag -a <TAG_NAME> -m ""
git push origin <BRANCH_NAME>

Commands

Environments

python -m pip install -e . --no-cache-dir  # prod
python -m pip install -e ".[test]" --no-cache-dir  # test
python -m pip install -e ".[docs]" --no-cache-dir  # docs
python -m pip install -e ".[dev]" --no-cache-dir  # dev
pre-commit install
pre-commit autoupdate

Docker

docker build -t tagifai:latest -f Dockerfile .
docker run -p 5000:5000 --name tagifai tagifai:latest

Application

uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app  # dev
gunicorn -c app/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app  # prod

Streamlit

streamlit run streamlit/app.py

MLFlow

mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/

Airflow

  1. Set up
AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db init  # `airflow db reset` to reset everything
airflow users create \
    --username admin \
    --firstname Goku \
    --lastname Mohandas \
    --role Admin \
    --email goku@madewithml.com
  1. Run webserver
export AIRFLOW_HOME=${PWD}/airflow
airflow webserver --port 8080

Run scheduler (in separate terminal)

export AIRFLOW_HOME=${PWD}/airflow
airflow scheduler

Feature store

feast init --minimal --template local features
touch features/features.py
cd features
feast apply
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Mkdocs

python -m mkdocs serve

Testing

  • Great expectation checkpoints (read more here)

    great_expectations checkpoint run projects
    great_expectations checkpoint run tags
  • Full coverage testing

    pytest tests --cov tagifai --cov app  # report in STDOUT
    pytest tests --cov tagifai --cov app --cov-report html  # report in htmlcov/
  • Testing only the non-training components

    pytest -m "not training"

Jupyterlab

python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab

You can also run all notebooks on Google Colab.

FAQ

Who is this content for?

  • Software engineers looking to learn ML and become even better software engineers.
  • Data scientists who want to learn how to responsibly deliver value with ML.
  • College graduates looking to learn the practical skills they'll need for the industry.
  • Product Managers who want to develop a technical foundation for ML applications.

What is the structure?

Lessons will be released weekly and each one will include:

  • intuition: high level overview of the concepts that will be covered and how it all fits together.
  • code: simple code examples to illustrate the concept.
  • application: applying the concept to our specific task.
  • extensions: brief look at other tools and techniques that will be useful for difference situations.

What makes this content unique?

  • hands-on: If you search production ML or MLOps online, you'll find great blog posts and tweets. But in order to really understand these concepts, you need to implement them. Unfortunately, you don’t see a lot of the inner workings of running production ML because of scale, proprietary content & expensive tools. However, Made With ML is free, open and live which makes it a perfect learning opportunity for the community.
  • intuition-first: We will never jump straight to code. In every lesson, we will develop intuition for the concepts and think about it from a product perspective.
  • software engineering: This course isn't just about ML. In fact, it's mostly about clean software engineering! We'll cover important concepts like versioning, testing, logging, etc. that really makes something production-grade product.
  • focused yet holistic: For every concept, we'll not only cover what's most important for our specific task (this is the case study aspect) but we'll also cover related methods (this is the guide aspect) which may prove to be useful in other situations.

Who is the author?

  • I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned.
  • Connect with me on Twitter and LinkedIn

Why is this free?

While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I believe that creativity and intelligence are randomly distributed while opportunities are siloed. I want to enable more people to create and contribute to innovation.


To cite this course, please use:
@misc{madewithml,
    author       = {Goku Mohandas},
    title        = {Made With ML MLOps Course},
    howpublished = {\url{https://madewithml.com/}},
    year         = {2021}
}

About

A project-based course on the foundations of MLOps with a focus on intuition and application.

https://madewithml.com

License:MIT License


Languages

Language:Jupyter Notebook 93.2%Language:Python 6.7%Language:Makefile 0.1%Language:Dockerfile 0.0%Language:CSS 0.0%