RuntimeError: CUDA error: Unspecified Launch Failure During Generation

Question

RuntimeError: CUDA error: Unspecified Launch Failure During Generation

donya-rooein opened this issue 4 months ago · comments

When attempting to generate text using a CUDA-enabled model, an unspecified launch failure CUDA error occurs. This error halts the generation process, leading to incomplete or failed batches.

checkpoint = "meta-llama/Llama-2-70b-chat-hf" responses= generator(texts, apply_chat_template=True, skip_prompt=True, batch_size="auto", temperature=0.0, max_new_tokens=256)

output:
Generation: 0%| | 0/96 [00:00<?, ?it/s] Error CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with "TORCH_USE_CUDA_DSA" to enable device-side assertions. Generation failed. Skipping batch.

Giuseppe Attanasio · Answer 1 · Tue Feb 20 2024 18:05:51 GMT+0800 (China Standard Time)

Hello Donya! It seems an error related to DDP. How are you executing the script? Can you also show me the run command? Is it a python ..., accelerate ... or torchrun ...?

Giuseppe Attanasio · Answer 2 · Tue Feb 20 2024 18:16:32 GMT+0800 (China Standard Time)

Also, I'm noticing you are loading the model without any quantization or dtype specified. That means you are loading the weights in full scale and probably running OOM. Can you try adding to the loader, for example, torch_dtype=torch.bfloat16?

Giuseppe Attanasio · Answer 3 · Wed Feb 28 2024 01:30:50 GMT+0800 (China Standard Time)

Hello @donya-rooein, did any of the fixes work fine?