yandex / YaLM-100B

Pretrained language model with 100B parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible to run on 8 x 24GB 3090?

hobodrifterdavid opened this issue · comments

This model looks amazing, thank you! We have a machine with 8 x 3090 (192GB total), I tried to run the examples, but I get:

building GPT2 model ...

RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 3; 23.70 GiB total capacity; 22.48 GiB already allocated; 70.56 MiB free; 22.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

For someone who is not an expert with pytorch etc., perhaps you have a suggestion?

We would try to make a conversation partner for language learning (add TTS, translation, NLP etc.) for our project: https://dev.languagereactor.com/

Regards, David :)

Shouldn't it be 100B x sizeof(double) or x sizeof(float)?

Weights are bfloat16, which is 16 bits, so you need at least 200GB to load those, plus some extra for inference.

Maybe a silly question: would it help to put a 9th card (9 x 24GB) in the machine? I have one extra.