Runtime error about concurrency
meanwo opened this issue · comments
meanwo commented
I tested with Apache Benchmark to test how many api calls Openllm can handle at the same time.
I set concurrent user 4, requests num 40(10 requests per one).
For 40 requests, 35 returned 200 success and the remaining 5 returned runtime errors.
This is the error log.
File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 165, in from_runner
structured = orjson.loads(data)
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 116, in generate_iterator
generated = GenerationOutput.from_runner(out).with_options(prompt=prompt)
File "/usr/local/lib/python3.8/dist-packages/openllm_core/_schemas.py", line 167, in from_runner
raise ValueError(f'Failed to parse JSON from SSE message: {data!r}') from e
ValueError: Failed to parse JSON from SSE message: 'Service Busy'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func
output = await api.func(*args)
File "/usr/local/lib/python3.8/dist-packages/openllm/_service.py", line 23, in generate_v1
return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 55, in generate
async for result in self.generate_iterator(
File "/usr/local/lib/python3.8/dist-packages/openllm/_llm.py", line 125, in generate_iterator
raise RuntimeError(f'Exception caught during generation: {err}') from err
RuntimeError: Exception caught during generation: Failed to parse JSON from SSE message: 'Service Busy'
Do I have to use a load balancer like nginx or gobetween to solve it?
Is this problem can't solve only in openllm?