- structures ML code the same so that work can easily be extended and replicated
- allows for rapid experimentation by automating pipeline with config files
- extends functionality of popular experiment loggers like Weights&Biases (mostly with dedicated callbacks)
This template tries to be as generic as possible - you should be able to easily modify behavior in train.py in case you need some unconventional configuration wiring.
Click on Use this template
button above to initialize new repository.
- PyTorch Lightning provides great abstractions for well structured ML code and advanced features like checkpointing, gradient accumulation, distributed training, etc.
- Hydra provides convenient way to manage experiment configurations and advanced features like overriding any config parameter from command line, scheduling execution of many runs, etc.
*warning: this template currently uses development version of hydra which might be unstable (we wait until version 1.1 is released)
*based on deep-learninig-project-template by PyTorchLightning organization.
*Suggestions are always welcome!
- Predefined folder structure
- Modularity: all abstractions are splitted into different submodules
- Automates PyTorch Lightning training pipeline with little boilerplate, so it can be easily modified (see train.py)
- All advantages of Hydra
- Main config file contains default training configuration (see config.yaml)
- Storing many experiment configurations in a convenient way (see project/configs/experiment)
- Command line features (see #How to run for examples)
- Override any config parameter from command line
- Schedule execution of many experiments from command line
- Sweep over hyperparameters from command line
- Convenient logging of run history, ckpts, etc. (see #Logs)
Validating correctness of config with schemas(TODO)
- Optional Weights&Biases utilities for experiment tracking
- Callbacks (see wandb_callbacks.py)
- Automatically store all code files and model checkpoints as artifacts in W&B cloud
- Generate confusion matrices and f1/precision/recall heatmaps
Hyperparameter search with Weights&Biases sweeps (execute_sweep.py)(TODO)
- Callbacks (see wandb_callbacks.py)
- Example of inference with trained model (inference_example.py)
- Built in requirements (requirements.txt)
- Built in conda environment initialization (conda_env.yaml)
- Built in python package setup (setup.py)
- Example with MNIST digits classification (mnist_model.py, mnist_datamodule.py)
The directory structure of new project looks like this:
βββ project
β βββ configs <- Hydra configuration files
β β βββ trainer <- Configurations of lightning trainers
β β βββ model <- Configurations of lightning models
β β βββ datamodule <- Configurations of lightning datamodules
β β βββ callbacks <- Configurations of lightning callbacks
β β βββ logger <- Configurations of lightning loggers
β β βββ seeds <- Configurations of seeds
β β βββ experiment <- Configurations of experiments
β β β
β β βββ config.yaml <- Main project configuration file
β β
β βββ data <- Project data
β β
β βββ logs <- Logs generated by hydra and pytorch lightning loggers
β β
β βββ notebooks <- Jupyter notebooks
β β
β βββ src
β β βββ architectures <- PyTorch model architectures
β β βββ callbacks <- PyTorch Lightning callbacks
β β βββ datamodules <- PyTorch Lightning datamodules
β β βββ datasets <- PyTorch datasets
β β βββ models <- PyTorch Lightning models
β β βββ transforms <- Data transformations
β β βββ utils <- Utility scripts
β β βββ inference_example.py <- Example of inference with trained model
β β βββ template_utils.py <- Some extra template utilities
β β
β βββ train.py <- Train model with chosen experiment configuration
β
βββ .gitignore
βββ LICENSE
βββ README.md
βββ conda_env.yaml <- File for installing conda environment
βββ requirements.txt <- File for installing python dependencies
βββ setup.py <- File for installing project as a package
- Write your PyTorch Lightning model (see mnist_model.py for example)
- Write your PyTorch Lightning datamodule (see mnist_datamodule.py for example)
- Write your experiment config, containing paths to your model and datamodule (see project/configs/experiment for examples)
- Run training with chosen experiment config:
python train.py +experiment=experiment_name.yaml
Main project configuration file (config.yaml)
Main config contains default training configuration.
It determines how config is composed when simply executing command: python train.py
# to execute run with default training configuration simply run:
# python train.py
# specify here default training configuration
defaults:
- trainer: default_trainer.yaml
- model: mnist_model.yaml
- datamodule: mnist_datamodule.yaml
- seeds: default_seeds.yaml # set this to null if you don't want to use seeds
- callbacks: default_callbacks.yaml # set this to null if you don't want to use callbacks
- logger: null # set logger here or use command line (e.g. `python train.py logger=wandb`)
# path to original working directory (the directory that `train.py` was executed from in command line)
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have path to original working directory as a special variable
# read more here: https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
original_work_dir: ${hydra:runtime.cwd}
# path to folder with data
data_dir: ${original_work_dir}/data/
# output paths for hydra logs
hydra:
run:
dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
sweep:
dir: logs/multiruns/${now:%Y-%m-%d_%H-%M-%S}
subdir: ${hydra.job.num}
Experiment configuration (project/configs/experiment)
You can store many experiment configurations in this folder.
Example experiment configuration:
# to execute this experiment run:
# python train.py +experiment=exp_example_simple
defaults:
- override /trainer: default_trainer.yaml
- override /model: mnist_model.yaml
- override /datamodule: mnist_datamodule.yaml
- override /seeds: default_seeds.yaml
- override /callbacks: default_callbacks.yaml
- override /logger: null
# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters
seeds:
pytorch_seed: 12345
trainer:
max_epochs: 10
gradient_clip_val: 0.5
model:
lr: 0.001
lin1_size: 128
lin2_size: 256
lin3_size: 64
datamodule:
batch_size: 64
train_val_test_split: [55_000, 5_000, 10_000]
More advanced experiment configuration:
# to execute this experiment run:
# python train.py +experiment=exp_example_with_paths
defaults:
- override /trainer: null
- override /model: null
- override /datamodule: null
- override /seeds: null
- override /callbacks: default_callbacks.yaml
- override /logger: null
# we override default configurations with nulls to prevent them from loading at all
# instead we define all modules and their paths directly in this config,
# so everything is stored in one place for more readibility
seeds:
pytorch_seed: 12345
trainer:
_target_: pytorch_lightning.Trainer
min_epochs: 1
max_epochs: 10
gradient_clip_val: 0.5
model:
_target_: src.models.mnist_model.LitModelMNIST
optimizer: adam
lr: 0.001
weight_decay: 0.000001
architecture: SimpleDenseNet
input_size: 784
lin1_size: 256
dropout1: 0.30
lin2_size: 256
dropout2: 0.25
lin3_size: 128
dropout3: 0.20
output_size: 10
datamodule:
_target_: src.datamodules.mnist_datamodule.MNISTDataModule
data_dir: ${data_dir}
batch_size: 64
train_val_test_split: [55_000, 5_000, 10_000]
num_workers: 1
pin_memory: False
By default, logs have the following structure:
βββ logs
β βββ runs # Folder for logs generated from single runs
β β βββ 2021-02-15 # Date of executing run
β β β βββ 16-50-49 # Hour of executing run
β β β β βββ .hydra # Hydra logs
β β β β βββ wandb # Weights&Biases logs
β β β β βββ checkpoints # Training checkpoints
β β β β βββ ... # Any other thing saved during training
β β β βββ ...
β β β βββ ...
β β βββ ...
β β βββ ...
β β
β βββ multiruns # Folder for logs generated from sweeps
β β βββ 2021-02-15_16-50-49 # Date and hour of executing sweep
β β β βββ 0 # Job number
β β β β βββ .hydra # Hydra logs
β β β β βββ wandb # Weights&Biases logs
β β β β βββ checkpoints # Training checkpoints
β β β β βββ ... # Any other thing saved during training
β β β βββ 1
β β β βββ 2
β β β βββ ...
β β βββ ...
β β βββ ...
β β
cookiecutter https://github.com/HemuManju/lightning-hydra-template.git
$ pip install cookiecutter
$ pip install requirements.txt
$ conda config --add channels conda-forge
$ conda install cookiecutter
$ pip install requirements.txt
Next, you can train model with default configuration without logging:
cd project_name
python train.py
Or you can train model with chosen logger like Weights&Biases:
# set project and entity names in 'project/configs/logger/wandb.yaml'
wandb:
project: "your_project_name"
entity: "your_wandb_team_name"
# train model with Weights&Biases
python train.py logger=wandb
Or you can train model with chosen experiment config:
# experiment configurations are placed in 'project/configs/experiment' folder
python train.py +experiment=exp_example_simple
To execute all experiments from folder run:
# execute all experiments from folder `project/configs/experiment`
python train.py --multirun '+experiment=glob(*)'
You can override any parameter from command line like this:
python train.py trainer.max_epochs=20 model.lr=0.0005
To train on GPU:
python train.py trainer.gpus=1
Attach some callback set to run:
# callback sets configurations are placed in 'project/configs/callbacks' folder
python train.py callbacks=default_callbacks
Combaining it all:
python train.py --multirun '+experiment=glob(*)' trainer.max_epochs=10 logger=wandb
To create a sweep over some hyperparameters run:
# this will run 6 experiments one after the other,
# each with different combination of batch_size and learning rate
python train.py --multirun datamodule.batch_size=32,64,128 model.lr=0.001,0.0005
Project based on the cookiecutter data science project template and pytorch lightning + hydra template