facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[CUDA OOM] reproduce ViT-B-16-quickgelu on V100 32G

hbchen121 opened this issue · comments

I tried to reproduce ViT-B-16-quickgelu on V100 32G with the same configuration, why am I OOM when batch_size=512? At this point, memory usage is 26/32GB when batch_size=256.

Do you know why that is?

I fix this error by using "grad_checkpointing"

thx, yes, we use gradient checkpointing to train in on 64 V100 GPUs, or better/more GPUs if you want to turn off gradient checkpointing to speed up.

grad_checkpointing=True