AkariAsai / self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Home Page:https://selfrag.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OOM issue when running the quick start code under 80GB gpu

chanchimin opened this issue · comments

Hello, I appreciate the effort you’ve put into your work!

I’ve been trying to execute your quick start code, but I’ve run into an Out Of Memory (OOM) error, despite having an 80GB GPU at my disposal. I was under the impression that a 7B model would fit comfortably within an 80GB GPU memory, so I’m unsure why I’m still facing this OOM error. Could you possibly shed some light on this issue? Thanks!

from vllm import LLM, SamplingParams
model = LLM("selfrag/selfrag_llama2_7b", download_dir=MY_DIR, dtype="half")

and by the way, can you tell me the typical memory usage when executing this code snippet?

Thank you for your interests! During inference of 7B model, we use a single GPU with 24 GB memory, so I'm not sure why you have got OOM error. Could you try different 7B model e.g., LLama2-7b-hf? If you still get the same OOM error, it might come from the vllm part, and may be better to ask in their Github isssues!

I'm closing this issue now as I am not sure if this comes from the Self-RAG model checkpoints itself, but feel free to reopen it!

Thank you for clarifying. I do not encounter the OOM issue now; it might be because someone might have occupied the GPU memory, and I did not notice it.