ViT Training Benchmark with ColossalAI

This is the NUS CS5260 course project, https://github.com/KoalaYuFeng/vit_train_benchmark_with_Colossalai. In the repository's README.md file, provide the following information:

Model Used in the Experiment

In this repository, we utilize pretrained weights of the Vision Transformer (ViT) loaded from HuggingFace. We adapt the ViT training code to work with ColossalAI by leveraging the Boosting API, which is loaded with a chosen plugin. Each plugin corresponds to a specific type of training strategy. This example supports plugins including:

TorchDDPPlugin (DDP)
LowLevelZeroPlugin (Zero1/Zero2)
GeminiPlugin (Gemini)

Dataset Employed

We use the BeansDataset from HuggingFace.

Instructions on How to Run the Code

First, ensure the correct version of PyTorch is installed that matches your CUDA version. In my case, with CUDA version 11.7, I install torch 1.13.0.
Include the requirements in the requirements.txt. You can install them using the command:
```
pip install -r requirements.txt
```

Clone the ColossalAI repository from GitHub:

git clone --recursive https://github.com/hpcaitech/ColossalAI.git

Navigate to the directory
```
cd ColossalAI/examples/images/vit
```

Run the script:

bash run_demo.sh // for training ViT:
bash run_benchmark.sh // for benchmark ViT:

Experiment Results

Training Accuracy

Epoch	Average Loss	Accuracy
1	1.1607	85.94%
2	0.2364	97.66%
3	0.2099	98.44%

Benchmark Results

The benchmarking was conducted using different plugins and batch sizes. The results are summarized in the table below:

Plugin	Batch Size per GPU	Throughput (samples/sec)	Maximum Memory Usage per GPU
`torch_ddp`	8	43.7168	1.80 GB
`torch_ddp_fp16`	8	60.1283	1.91 GB
`low_level_zero`	8	47.1534	1.65 GB
`gemini`	8	28.0425	663.17 MB
`torch_ddp`	32	66.7630	2.34 GB
`torch_ddp_fp16`	32	153.6898	2.25 GB
`low_level_zero`	32	143.5798	1.66 GB
`gemini`	32	110.6582	663.17 MB

For more detailed configurations and complete benchmark results, please refer to the log file in the repository.

KoalaYuFeng / vit_train_benchmark_with_Colossalai