rmihaylov / falcontune

Tune any FALCON in 4-bit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: CUDA error: an illegal memory access was encountered

gpravi opened this issue · comments

commented
falcontune generate     --interactive     --model falcon-40b-instruct-4bit     
--weights gptq_model-4bit--1g.safetensors     --max_new_tokens=50     
--use_cache     --do_sample     
--instruction "Who was the first person on the moon?"

...
RuntimeError: CUDA error: an illegal memory access was encountered

While trying to generate on a multiple GPU machine encountered the above error

commented

So just using 1 GPU with the following, export CUDA_VISIBLE_DEVICES=1

Same here, two 3090 and falcontune crashes with the same CUDA illegal memory access error.