bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Runtime error about concurrency

meanwo opened this issue · comments

I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time.

I set concurrent user 4, requests num 40(10 requests per one).

For 40 requests, 35 returned 200 success and the remaining 5 returned runtime errors.

This is the error log.

File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 165, in from_runner
structured = orjson.loads(data)
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 116, in generate_iterator
generated = GenerationOutput.from_runner(out).with_options(prompt=prompt)
File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 167, in from_runner
raise ValueError(f'Failed to parse JSON from SSE message: {data!r}') from e
ValueError: Failed to parse JSON from SSE message: 'Service Busy'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/usr/local/lib/python3.8/dist-packages/openllm/_service.py", line 23, in generate_v1
return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 55, in generate
async for result in self.generate_iterator(
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 125, in generate_iterator
raise RuntimeError(f'Exception caught during generation: {err}') from err
RuntimeError: Exception caught during generation: Failed to parse JSON from SSE message: 'Service Busy'

Do I have to use a load balancer like nginx or gobetween to solve it?
Is this problem can't solve only in openllm?