Possible to run on 8 x 24GB 3090?
hobodrifterdavid opened this issue · comments
This model looks amazing, thank you! We have a machine with 8 x 3090 (192GB total), I tried to run the examples, but I get:
building GPT2 model ...
RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 3; 23.70 GiB total capacity; 22.48 GiB already allocated; 70.56 MiB free; 22.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
For someone who is not an expert with pytorch etc., perhaps you have a suggestion?
We would try to make a conversation partner for language learning (add TTS, translation, NLP etc.) for our project: https://dev.languagereactor.com/
Regards, David :)
Shouldn't it be 100B x sizeof(double) or x sizeof(float)?
Weights are bfloat16, which is 16 bits, so you need at least 200GB to load those, plus some extra for inference.
Maybe a silly question: would it help to put a 9th card (9 x 24GB) in the machine? I have one extra.