amazingguni / fsdl-text-recognizer-2022-labs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🥞 FSDL Text Recognizer 2022

Implements a text recognizer using a hypermodern 2022 DL stack:

  • Code quality assurance with pre-commit hooks, GitHub Actions, and pytest
  • Real data augmented with data synthesis
  • Image encoding via a CNN architecture
  • Decoding to text via a Transformer architecture
  • Model implementation in PyTorch
  • Model training and basic tracking and metrics via PyTorch Lightning and torchmetrics
  • Experiment tracking, hyperparameter tuning, and model versioning with W&B
  • Model packaging in TorchScript
  • Predictor backend containerization via Docker and deployment as a microservice on AWS Lambda
  • Basic load testing with Locust
  • Pure Python frontend web application in Gradio
  • Model monitoring with Gantry

Setup local environment

Setting up a machine that you can sit in front of or SSH into is easy.

Watch a walkthrough video here.

If you get stuck, it's better to at least get started with the labs on Colab, where setup is just a single click, rather than getting frustrated and burning out on annoying environment management, networking, and systems administration issues that aren't as relevant to making ML-powered products.

Summary

  • environment.yml specifies Python and optionally CUDA/CUDNN
  • make conda-update creates/updates a virtual environment
  • conda activate fsdl-text-recognizer-2022 activates the virtual environment
  • requirements/prod.in and requirements/dev.in specify core Python packages in that environment
  • make pip-tools resolves all other Python dependencies and installs them
  • export PYTHONPATH=.:$PYTHONPATH makes the current directory visible on your Python path -- add it to your ~/.bashrc and source ~/.bashrc

1. Check out the repo

git clone https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022-labs.git
cd fsdl-text-recognizer-2022-labs

2. Set up the Python environment

We use conda for managing Python and CUDA versions, and pip-tools for managing Python package dependencies.

We add a Makefile for making setup dead-simple.

First: Install the Python + CUDA environment using Anaconda

Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux.

It is most closely associated with Python, but in fact it can manage more than just Python environments.

To install conda, follow instructions at https://conda.io/projects/conda/en/latest/user-guide/install/linux.html. We use Python 3.8.

Note that you will likely need to close and re-open your terminal. Afterwards, you should have ability to run the conda command in your terminal.

Run make conda-update to create an environment called fsdl-text-recognizer-2022, as defined in environment.yml. This environment will provide us with the right Python version as well as the CUDA and CUDNN libraries.

If you edit environment.yml, just run make conda-update again to get the latest changes.

Next, activate the conda environment.

conda activate fsdl-text-recognizer-2022

IMPORTANT: every time you work in this directory, make sure to start your session with conda activate fsdl-text-recognizer-2022.

Next: install Python packages

Next, install all necessary Python packages by running make pip-tools

Using pip-tools lets us do three nice things:

  1. Separate out dev from production dependencies (dev.in vs prod.in).
  2. Have a lockfile of exact versions for all dependencies (the auto-generated dev.txt and prod.txt).
  3. Allow us to easily deploy to targets that don't support the conda environment, like Colab.

Set PYTHONPATH

Last, run export PYTHONPATH=. before executing any commands later on, or you will get errors like this:

ModuleNotFoundError: No module named 'text_recognizer'

In order to not have to set PYTHONPATH in every terminal you open, just add that line as the last line of the ~/.bashrc file using a text editor of your choice (e.g. nano ~/.bashrc) or by concatenating with >>

echo "export PYTHONPATH=.:$PYTHONPATH" >> ~/.bashrc

About


Languages

Language:Jupyter Notebook 93.7%Language:Python 6.1%Language:Shell 0.1%Language:Makefile 0.1%Language:Dockerfile 0.0%