HemuManju / lightning-hydra-template

PyTorch Lightning + Hydra + Weights&Biases = starting template for well structured ML code ⚑πŸ”₯πŸ”₯πŸ”₯

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PyTorch Lightning + Hydra cookiecutter template

A clean and simple template to kickstart your deep learning project πŸš€βš‘πŸ”₯

  • structures ML code the same so that work can easily be extended and replicated
  • allows for rapid experimentation by automating pipeline with config files
  • extends functionality of popular experiment loggers like Weights&Biases (mostly with dedicated callbacks)

This template tries to be as generic as possible - you should be able to easily modify behavior in train.py in case you need some unconventional configuration wiring.

Click on Use this template button above to initialize new repository.

Why Lightning + Hydra?

  • PyTorch Lightning provides great abstractions for well structured ML code and advanced features like checkpointing, gradient accumulation, distributed training, etc.
  • Hydra provides convenient way to manage experiment configurations and advanced features like overriding any config parameter from command line, scheduling execution of many runs, etc.

Some Notes

*warning: this template currently uses development version of hydra which might be unstable (we wait until version 1.1 is released)
*based on deep-learninig-project-template by PyTorchLightning organization.
*Suggestions are always welcome!

Features

  • Predefined folder structure
  • Modularity: all abstractions are splitted into different submodules
  • Automates PyTorch Lightning training pipeline with little boilerplate, so it can be easily modified (see train.py)
  • All advantages of Hydra
    • Main config file contains default training configuration (see config.yaml)
    • Storing many experiment configurations in a convenient way (see project/configs/experiment)
    • Command line features (see #How to run for examples)
      • Override any config parameter from command line
      • Schedule execution of many experiments from command line
      • Sweep over hyperparameters from command line
    • Convenient logging of run history, ckpts, etc. (see #Logs)
    • Validating correctness of config with schemas (TODO)
  • Optional Weights&Biases utilities for experiment tracking
    • Callbacks (see wandb_callbacks.py)
      • Automatically store all code files and model checkpoints as artifacts in W&B cloud
      • Generate confusion matrices and f1/precision/recall heatmaps
    • Hyperparameter search with Weights&Biases sweeps (execute_sweep.py) (TODO)
  • Example of inference with trained model (inference_example.py)
  • Built in requirements (requirements.txt)
  • Built in conda environment initialization (conda_env.yaml)
  • Built in python package setup (setup.py)
  • Example with MNIST digits classification (mnist_model.py, mnist_datamodule.py)

Project structure

The directory structure of new project looks like this:

β”œβ”€β”€ project
β”‚   β”œβ”€β”€ configs                 <- Hydra configuration files
β”‚   β”‚   β”œβ”€β”€ trainer                 <- Configurations of lightning trainers
β”‚   β”‚   β”œβ”€β”€ model                   <- Configurations of lightning models
β”‚   β”‚   β”œβ”€β”€ datamodule              <- Configurations of lightning datamodules
β”‚   β”‚   β”œβ”€β”€ callbacks               <- Configurations of lightning callbacks
β”‚   β”‚   β”œβ”€β”€ logger                  <- Configurations of lightning loggers
β”‚   β”‚   β”œβ”€β”€ seeds                   <- Configurations of seeds
β”‚   β”‚   β”œβ”€β”€ experiment              <- Configurations of experiments
β”‚   β”‚   β”‚
β”‚   β”‚   └── config.yaml             <- Main project configuration file
β”‚   β”‚
β”‚   β”œβ”€β”€ data                    <- Project data
β”‚   β”‚
β”‚   β”œβ”€β”€ logs                    <- Logs generated by hydra and pytorch lightning loggers
β”‚   β”‚
β”‚   β”œβ”€β”€ notebooks               <- Jupyter notebooks
β”‚   β”‚
β”‚   β”œβ”€β”€ src
β”‚   β”‚   β”œβ”€β”€ architectures           <- PyTorch model architectures
β”‚   β”‚   β”œβ”€β”€ callbacks               <- PyTorch Lightning callbacks
β”‚   β”‚   β”œβ”€β”€ datamodules             <- PyTorch Lightning datamodules
β”‚   β”‚   β”œβ”€β”€ datasets                <- PyTorch datasets
β”‚   β”‚   β”œβ”€β”€ models                  <- PyTorch Lightning models
β”‚   β”‚   β”œβ”€β”€ transforms              <- Data transformations
β”‚   β”‚   └── utils                   <- Utility scripts
β”‚   β”‚       β”œβ”€β”€ inference_example.py    <- Example of inference with trained model
β”‚   β”‚       └── template_utils.py       <- Some extra template utilities
β”‚   β”‚
β”‚   └── train.py                <- Train model with chosen experiment configuration
β”‚
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ conda_env.yaml          <- File for installing conda environment
β”œβ”€β”€ requirements.txt        <- File for installing python dependencies
└── setup.py                <- File for installing project as a package

Workflow

  1. Write your PyTorch Lightning model (see mnist_model.py for example)
  2. Write your PyTorch Lightning datamodule (see mnist_datamodule.py for example)
  3. Write your experiment config, containing paths to your model and datamodule (see project/configs/experiment for examples)
  4. Run training with chosen experiment config:
    python train.py +experiment=experiment_name.yaml

Main project configuration file (config.yaml)

Main config contains default training configuration.
It determines how config is composed when simply executing command: python train.py

# to execute run with default training configuration simply run:
# python train.py


# specify here default training configuration
defaults:
    - trainer: default_trainer.yaml
    - model: mnist_model.yaml
    - datamodule: mnist_datamodule.yaml
    - seeds: default_seeds.yaml  # set this to null if you don't want to use seeds
    - callbacks: default_callbacks.yaml  # set this to null if you don't want to use callbacks
    - logger: null  # set logger here or use command line (e.g. `python train.py logger=wandb`)


# path to original working directory (the directory that `train.py` was executed from in command line)
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have path to original working directory as a special variable
# read more here: https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
original_work_dir: ${hydra:runtime.cwd}


# path to folder with data
data_dir: ${original_work_dir}/data/


# output paths for hydra logs
hydra:
    run:
        dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
    sweep:
        dir: logs/multiruns/${now:%Y-%m-%d_%H-%M-%S}
        subdir: ${hydra.job.num}

Experiment configuration (project/configs/experiment)

You can store many experiment configurations in this folder.
Example experiment configuration:

# to execute this experiment run:
# python train.py +experiment=exp_example_simple

defaults:
    - override /trainer: default_trainer.yaml
    - override /model: mnist_model.yaml
    - override /datamodule: mnist_datamodule.yaml
    - override /seeds: default_seeds.yaml
    - override /callbacks: default_callbacks.yaml
    - override /logger: null

# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters

seeds:
    pytorch_seed: 12345

trainer:
    max_epochs: 10
    gradient_clip_val: 0.5

model:
    lr: 0.001
    lin1_size: 128
    lin2_size: 256
    lin3_size: 64

datamodule:
    batch_size: 64
    train_val_test_split: [55_000, 5_000, 10_000]

More advanced experiment configuration:

# to execute this experiment run:
# python train.py +experiment=exp_example_with_paths

defaults:
    - override /trainer: null
    - override /model: null
    - override /datamodule: null
    - override /seeds: null
    - override /callbacks: default_callbacks.yaml
    - override /logger: null

# we override default configurations with nulls to prevent them from loading at all
# instead we define all modules and their paths directly in this config,
# so everything is stored in one place for more readibility

seeds:
    pytorch_seed: 12345

trainer:
    _target_: pytorch_lightning.Trainer
    min_epochs: 1
    max_epochs: 10
    gradient_clip_val: 0.5

model:
    _target_: src.models.mnist_model.LitModelMNIST
    optimizer: adam
    lr: 0.001
    weight_decay: 0.000001
    architecture: SimpleDenseNet
    input_size: 784
    lin1_size: 256
    dropout1: 0.30
    lin2_size: 256
    dropout2: 0.25
    lin3_size: 128
    dropout3: 0.20
    output_size: 10

datamodule:
    _target_: src.datamodules.mnist_datamodule.MNISTDataModule
    data_dir: ${data_dir}
    batch_size: 64
    train_val_test_split: [55_000, 5_000, 10_000]
    num_workers: 1
    pin_memory: False

Logs

By default, logs have the following structure:

β”œβ”€β”€ logs
β”‚   β”œβ”€β”€ runs                     # Folder for logs generated from single runs
β”‚   β”‚   β”œβ”€β”€ 2021-02-15              # Date of executing run
β”‚   β”‚   β”‚   β”œβ”€β”€ 16-50-49                # Hour of executing run
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ .hydra                  # Hydra logs
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ wandb                   # Weights&Biases logs
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ checkpoints             # Training checkpoints
β”‚   β”‚   β”‚   β”‚   └── ...                     # Any other thing saved during training
β”‚   β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── ...
β”‚   β”‚
β”‚   β”œβ”€β”€ multiruns               # Folder for logs generated from sweeps
β”‚   β”‚   β”œβ”€β”€ 2021-02-15_16-50-49     # Date and hour of executing sweep
β”‚   β”‚   β”‚   β”œβ”€β”€ 0                       # Job number
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ .hydra                  # Hydra logs
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ wandb                   # Weights&Biases logs
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ checkpoints             # Training checkpoints
β”‚   β”‚   β”‚   β”‚   └── ...                     # Any other thing saved during training
β”‚   β”‚   β”‚   β”œβ”€β”€ 1
β”‚   β”‚   β”‚   β”œβ”€β”€ 2
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   └── ...
β”‚   β”‚




To start a new project, run:


cookiecutter https://github.com/HemuManju/lightning-hydra-template.git

Install dependencies

$ pip install cookiecutter
$ pip install requirements.txt
$ conda config --add channels conda-forge
$ conda install cookiecutter
$ pip install requirements.txt

Next, you can train model with default configuration without logging:

cd project_name
python train.py

Or you can train model with chosen logger like Weights&Biases:

# set project and entity names in 'project/configs/logger/wandb.yaml'
wandb:
    project: "your_project_name"
    entity: "your_wandb_team_name"
# train model with Weights&Biases
python train.py logger=wandb

Or you can train model with chosen experiment config:

# experiment configurations are placed in 'project/configs/experiment' folder
python train.py +experiment=exp_example_simple

To execute all experiments from folder run:

# execute all experiments from folder `project/configs/experiment`
python train.py --multirun '+experiment=glob(*)'

You can override any parameter from command line like this:

python train.py trainer.max_epochs=20 model.lr=0.0005

To train on GPU:

python train.py trainer.gpus=1

Attach some callback set to run:

# callback sets configurations are placed in 'project/configs/callbacks' folder
python train.py callbacks=default_callbacks

Combaining it all:

python train.py --multirun '+experiment=glob(*)' trainer.max_epochs=10 logger=wandb

To create a sweep over some hyperparameters run:

# this will run 6 experiments one after the other,
# each with different combination of batch_size and learning rate
python train.py --multirun datamodule.batch_size=32,64,128 model.lr=0.001,0.0005

Project based on the cookiecutter data science project template and pytorch lightning + hydra template

About

PyTorch Lightning + Hydra + Weights&Biases = starting template for well structured ML code ⚑πŸ”₯πŸ”₯πŸ”₯

License:MIT License


Languages

Language:Python 64.9%Language:Makefile 18.4%Language:Batchfile 16.4%Language:HTML 0.3%