CUDA out of memory on flan-ul2

Question

CUDA out of memory on flan-ul2

sigmareaver opened this issue a year ago · comments

Tested on 4090
Using command:
python t5.py ../full-models/flan-ul2 c4 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt
What is the memory requirement for quantizing a 20b model? I thought it should only need one layer at a time on GPU?

Sigma Reaver · Answer 1 · Fri Jun 23 2023 12:44:49 GMT+0800 (China Standard Time)

Was able to quantize using --nsamples 256 and hacking a part of the code in t5_sequential, the part about final layer norms and dropout, to be run on CPU.