Image classification is one of the fundamental computer vision tasks that serves as the backbone for solving different computer vision problems.
-
Distributed data_parallel training (Pytorch Distributed Data Parallel)
- Run multiple processes on different machines and various GPUs
-
Three different initialization are supported.
-
Hyperparamters are given as a
Yaml
file for each scenario. -
Creating folders named with the time of execution for the followings outputs:
- Training and test logs as text
- Initial and final checkpoints
- Training and test statistics as Numpy variables
- Tensorboard logs
-
Four different learning rate schedulers are supported.
-
Hyperparamters are given as a
Yaml
file for each scenario. -
All training and validation metrics are save for further analysis.
-
Tensorboard events are logged in a separate folder categorized by the dataset and architecture.
-
Every run file is given as bash file.
An example for training a fully connected network on MNIST dataset is the following:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$RANDOM main.py \
-a fc --dataname mnist --config mnist_fc_train --logterminal
An example for training a ResNet50 network on ImageNet1K dataset is the following:
python -m torch.distributed.launch --nproc_per_node=5 --master_port=$RANDOM main.py
-a resnet50 --dataname imagenet --config imagenet_resnet50_train