[CUDA OOM] reproduce ViT-B-16-quickgelu on V100 32G
hbchen121 opened this issue · comments
Cheng Nan commented
I tried to reproduce ViT-B-16-quickgelu on V100 32G with the same configuration, why am I OOM when batch_size=512? At this point, memory usage is 26/32GB when batch_size=256.
Do you know why that is?
Cheng Nan commented
I fix this error by using "grad_checkpointing"
Hu Xu commented
thx, yes, we use gradient checkpointing to train in on 64 V100 GPUs, or better/more GPUs if you want to turn off gradient checkpointing to speed up.
MetaCLIP/run_configs_fullcc.py
Line 39 in ea88021