light-and-ray / sd-webui-replacer

A tab for sd-webui for replacing objects in pictures or videos using detection prompt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RuntimeError: CUDA error: an illegal memory access was encountered

mike2505 opened this issue · comments

I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.

Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:

torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..

Uploading nvidia-smi output and log
image
out.log

Just tested with only one instance running with device-id=1. I still have the same issue, same goes with any device id except 0...

I think it's connected with segment anything extension. It uses 3 different models which are not in sd-webui. Maybe they're moved incorrectly for multy GPU systems. Ask about it there, but I think in your case you need to explore the code by yourself

Also try different Sam models, they have different code. Maybe one of them will work

If someone has the same problem, there's the answer: continue-revolution/sd-webui-segment-anything#201 (comment)