philschmid / easyllm

Home Page:https://philschmid.github.io/easyllm/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OverloadedError: Model is overloaded

farshidbalan opened this issue · comments

I am using the meta-llama/Llama-2-70b-chat-hf model on a data frame with 3000 rows, each including a 500-token text. But after 10 rows is processed, I get the following error

` in call_llama2_api(self, messages)
79 def call_llama2_api(self, messages):
80 huggingface.prompt_builder = "llama2"
---> 81 response = huggingface.ChatCompletion.create(
82 model="meta-llama/Llama-2-70b-chat-hf",
83 messages=messages,

/usr/local/lib/python3.10/dist-packages/easyllm/clients/huggingface.py in create(messages, model, temperature, top_p, top_k, n, max_tokens, stop, stream, frequency_penalty, debug)
205 generated_tokens = 0
206 for _i in range(request.n):
--> 207 res = client.text_generation(
208 prompt,
209 details=True,

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_client.py in text_generation(self, prompt, details, stream, model, do_sample, max_new_tokens, best_of, repetition_penalty, return_full_text, seed, stop_sequences, temperature, top_k, top_p, truncate, typical_p, watermark, decoder_input_details)
1063 decoder_input_details=decoder_input_details,
1064 )
-> 1065 raise_text_generation_error(e)
1066
1067 # Parse output

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_text_generation.py in raise_text_generation_error(http_error)
472 raise IncompleteGenerationError(message) from http_error
473 if error_type == "overloaded":
--> 474 raise OverloadedError(message) from http_error
475 if error_type == "validation":
476 raise ValidationError(message) from http_error

OverloadedError: Model is overloaded`

Is there any solution to fix this problem, like increasing the rate limit?

What happens if you give it 3000 rows with 250 max token? Is it the same thing?

This might be out of memory (oom) or endpoint problem. I think it would be helpful if you put in your ram&vram for spec&usage of your system.