OOM issue when running the quick start code under 80GB gpu

Question

OOM issue when running the quick start code under 80GB gpu

chanchimin opened this issue 7 months ago · comments

Hello, I appreciate the effort you’ve put into your work!

I’ve been trying to execute your quick start code, but I’ve run into an Out Of Memory (OOM) error, despite having an 80GB GPU at my disposal. I was under the impression that a 7B model would fit comfortably within an 80GB GPU memory, so I’m unsure why I’m still facing this OOM error. Could you possibly shed some light on this issue? Thanks!

from vllm import LLM, SamplingParams
model = LLM("selfrag/selfrag_llama2_7b", download_dir=MY_DIR, dtype="half")

and by the way, can you tell me the typical memory usage when executing this code snippet?

Akari Asai · Answer 1 · Fri Nov 17 2023 02:04:54 GMT+0800 (China Standard Time)

Thank you for your interests! During inference of 7B model, we use a single GPU with 24 GB memory, so I'm not sure why you have got OOM error. Could you try different 7B model e.g., LLama2-7b-hf? If you still get the same OOM error, it might come from the vllm part, and may be better to ask in their Github isssues!

Akari Asai · Answer 2 · Mon Nov 27 2023 22:41:56 GMT+0800 (China Standard Time)

I'm closing this issue now as I am not sure if this comes from the Self-RAG model checkpoints itself, but feel free to reopen it!

Chan Chi-Min · Answer 3 · Wed Dec 27 2023 23:05:19 GMT+0800 (China Standard Time)

Thank you for clarifying. I do not encounter the OOM issue now; it might be because someone might have occupied the GPU memory, and I did not notice it.