facebookresearch / MetaCLIP

I tried to reproduce ViT-B-16-quickgelu on V100 32G with the same configuration, why am I OOM when batch_size=512? At this point, memory usage is 26/32GB when batch_size=256.

Do you know why that is?

I fix this error by using "grad_checkpointing"

thx, yes, we use gradient checkpointing to train in on 64 V100 GPUs, or better/more GPUs if you want to turn off gradient checkpointing to speed up.

MetaCLIP/run_configs_fullcc.py

Line 39 in ea88021

grad_checkpointing=True

[CUDA OOM] reproduce ViT-B-16-quickgelu on V100 32G