lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vllm worker does not release semaphore

jbding opened this issue · comments

When I used vllm_worker to deploy the vicuna model, the --limit-worker-concurrency settings was 3. After running for a while, I found that the model could not work. From the log, I found that the semaphore was not released (after three times, the semaphore value was 0)

Here is the log

2024-05-10 05:56:32 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 05:56:32 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 05:56:32 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:56:32 | ERROR | stderr | await func()
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 05:56:32 | ERROR | stderr | message = await receive()
2024-05-10 05:56:32 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 05:56:32 | ERROR | stderr | await self.message_event.wait()
2024-05-10 05:56:32 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 05:56:32 | ERROR | stderr | await fut
2024-05-10 05:56:32 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe55b141750
2024-05-10 05:56:32 | ERROR | stderr |
2024-05-10 05:56:32 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 05:56:32 | ERROR | stderr |
2024-05-10 05:56:32 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 05:56:32 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 05:56:32 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 05:56:32 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 05:56:32 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 05:56:32 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 05:56:32 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 05:56:32 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | raise exc
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:56:32 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 05:56:32 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 05:56:32 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 05:56:32 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 05:56:32 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 05:56:32 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 05:56:32 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:56:32 | ERROR | stderr | | await func()
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 05:56:32 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 05:56:32 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 05:56:32 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 05:56:32 | ERROR | stderr | | self._verify_args()
2024-05-10 05:56:32 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 05:56:32 | ERROR | stderr | | raise ValueError(
2024-05-10 05:56:32 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -862.
2024-05-10 05:56:32 | ERROR | stderr | +------------------------------------
2024-05-10 05:56:43 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1220. worker_id: 42c39e7a.
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46612 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46614 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:56:47 async_llm_engine.py:371] Received request 18dbe2d2c72c4cfe9fea1922bd4e8b84: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:47 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:56:47 async_llm_engine.py:111] Finished request 18dbe2d2c72c4cfe9fea1922bd4e8b84.
INFO 05-10 05:56:47 async_llm_engine.py:134] Aborted request 18dbe2d2c72c4cfe9fea1922bd4e8b84.
2024-05-10 05:56:47 | INFO | stdout | INFO: 127.0.0.1:46616 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46708 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:56:48 | INFO | stdout | INFO: 127.0.0.1:46710 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:56:48 async_llm_engine.py:371] Received request 594e5b0d350c4a5b8401814198fc447e: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:49 async_llm_engine.py:111] Finished request 594e5b0d350c4a5b8401814198fc447e.
INFO 05-10 05:56:49 async_llm_engine.py:134] Aborted request 594e5b0d350c4a5b8401814198fc447e.
2024-05-10 05:56:49 | INFO | stdout | INFO: 127.0.0.1:46712 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:56:54 | INFO | stdout | INFO: 127.0.0.1:46930 - "POST /worker_generate_stream HTTP/1.1" 200 OK
INFO 05-10 05:56:54 async_llm_engine.py:371] Received request 9b3295689edf474ba87e1ff73acf28a4: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:56:55 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:56:56 async_llm_engine.py:111] Finished request 9b3295689edf474ba87e1ff73acf28a4.
INFO 05-10 05:56:56 async_llm_engine.py:134] Aborted request 9b3295689edf474ba87e1ff73acf28a4.
2024-05-10 05:57:26 | INFO | stdout | INFO: 127.0.0.1:47946 - "POST /worker_generate_stream HTTP/1.1" 200 OK
INFO 05-10 05:57:26 async_llm_engine.py:371] Received request f274e61e2013461f9ff211e905272eb7: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: 你好 ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.7, top_p=1.0, top_k=-1.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=512, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:57:26 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:57:27 async_llm_engine.py:111] Finished request f274e61e2013461f9ff211e905272eb7.
INFO 05-10 05:57:27 async_llm_engine.py:134] Aborted request f274e61e2013461f9ff211e905272eb7.
2024-05-10 05:57:28 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=2, locked=False). call_ct: 1224. worker_id: 42c39e7a.
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49372 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49374 - "POST /count_token HTTP/1.1" 200 OK
2024-05-10 05:58:10 | INFO | stdout | INFO: 127.0.0.1:49378 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-05-10 05:58:10 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 05:58:10 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 05:58:10 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:58:10 | ERROR | stderr | await func()
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 05:58:10 | ERROR | stderr | message = await receive()
2024-05-10 05:58:10 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 05:58:10 | ERROR | stderr | await self.message_event.wait()
2024-05-10 05:58:10 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 05:58:10 | ERROR | stderr | await fut
2024-05-10 05:58:10 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560585710
2024-05-10 05:58:10 | ERROR | stderr |
2024-05-10 05:58:10 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 05:58:10 | ERROR | stderr |
2024-05-10 05:58:10 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 05:58:10 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 05:58:10 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 05:58:10 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 05:58:10 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 05:58:10 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 05:58:10 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 05:58:10 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | raise exc
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 05:58:10 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 05:58:10 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 05:58:10 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 05:58:10 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 05:58:10 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 05:58:10 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 05:58:10 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 05:58:10 | ERROR | stderr | | await func()
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 05:58:10 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 05:58:10 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 05:58:10 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 05:58:10 | ERROR | stderr | | self._verify_args()
2024-05-10 05:58:10 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 05:58:10 | ERROR | stderr | | raise ValueError(
2024-05-10 05:58:10 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763.
2024-05-10 05:58:10 | ERROR | stderr | +------------------------------------
2024-05-10 05:58:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a.
2024-05-10 05:58:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1225. worker_id: 42c39e7a.
2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51574 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 05:59:19 | INFO | stdout | INFO: 127.0.0.1:51576 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 05:59:19 async_llm_engine.py:371] Received request 0ea08a7a94b44c0783dd435e387725d3: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=1e-08, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=4048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 05:59:19 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 05:59:20 async_llm_engine.py:111] Finished request 0ea08a7a94b44c0783dd435e387725d3.
INFO 05-10 05:59:20 async_llm_engine.py:134] Aborted request 0ea08a7a94b44c0783dd435e387725d3.
2024-05-10 05:59:20 | INFO | stdout | INFO: 127.0.0.1:51578 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 05:59:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:00:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:01:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=1, locked=False). call_ct: 1226. worker_id: 42c39e7a.
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55740 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55742 - "POST /count_token HTTP/1.1" 200 OK
INFO 05-10 06:01:30 async_llm_engine.py:371] Received request a22b98b85b564eaaad458ba20d1addaf: prompt: "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: If you are available, please return OK. ASSISTANT:", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[''], ignore_eos=False, max_tokens=2048, logprobs=None, prompt_logprobs=None, skip_special_tokens=True), prompt token ids: None.
INFO 05-10 06:01:30 llm_engine.py:624] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.1%, CPU KV cache usage: 0.0%
INFO 05-10 06:01:30 async_llm_engine.py:111] Finished request a22b98b85b564eaaad458ba20d1addaf.
INFO 05-10 06:01:30 async_llm_engine.py:134] Aborted request a22b98b85b564eaaad458ba20d1addaf.
2024-05-10 06:01:30 | INFO | stdout | INFO: 127.0.0.1:55744 - "POST /worker_generate HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56292 - "POST /model_details HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56294 - "POST /count_token HTTP/1.1" 200 OK
2024-05-10 06:01:46 | INFO | stdout | INFO: 127.0.0.1:56298 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-05-10 06:01:46 | ERROR | stderr | ERROR: Exception in ASGI application
2024-05-10 06:01:46 | ERROR | stderr | Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 265, in call
2024-05-10 06:01:46 | ERROR | stderr | await wrap(partial(self.listen_for_disconnect, receive))
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 06:01:46 | ERROR | stderr | await func()
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
2024-05-10 06:01:46 | ERROR | stderr | message = await receive()
2024-05-10 06:01:46 | ERROR | stderr | ^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
2024-05-10 06:01:46 | ERROR | stderr | await self.message_event.wait()
2024-05-10 06:01:46 | ERROR | stderr | File "/home/dingjb/miniconda3/lib/python3.11/asyncio/locks.py", line 213, in wait
2024-05-10 06:01:46 | ERROR | stderr | await fut
2024-05-10 06:01:46 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fe560586f10
2024-05-10 06:01:46 | ERROR | stderr |
2024-05-10 06:01:46 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-05-10 06:01:46 | ERROR | stderr |
2024-05-10 06:01:46 | ERROR | stderr | + Exception Group Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
2024-05-10 06:01:46 | ERROR | stderr | | result = await app( # type: ignore[func-returns-value]
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
2024-05-10 06:01:46 | ERROR | stderr | | return await self.app(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
2024-05-10 06:01:46 | ERROR | stderr | | await super().call(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, _send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
2024-05-10 06:01:46 | ERROR | stderr | | await self.middleware_stack(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
2024-05-10 06:01:46 | ERROR | stderr | | await route.handle(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
2024-05-10 06:01:46 | ERROR | stderr | | await self.app(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
2024-05-10 06:01:46 | ERROR | stderr | | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | raise exc
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
2024-05-10 06:01:46 | ERROR | stderr | | await app(scope, receive, sender)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
2024-05-10 06:01:46 | ERROR | stderr | | await response(scope, receive, send)
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 258, in call
2024-05-10 06:01:46 | ERROR | stderr | | async with anyio.create_task_group() as task_group:
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
2024-05-10 06:01:46 | ERROR | stderr | | raise BaseExceptionGroup(
2024-05-10 06:01:46 | ERROR | stderr | | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-05-10 06:01:46 | ERROR | stderr | +-+---------------- 1 ----------------
2024-05-10 06:01:46 | ERROR | stderr | | Traceback (most recent call last):
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
2024-05-10 06:01:46 | ERROR | stderr | | await func()
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
2024-05-10 06:01:46 | ERROR | stderr | | async for chunk in self.body_iterator:
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/fastchat/serve/vllm_worker.py", line 99, in generate_stream
2024-05-10 06:01:46 | ERROR | stderr | | sampling_params = SamplingParams(
2024-05-10 06:01:46 | ERROR | stderr | | ^^^^^^^^^^^^^^^
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 118, in init
2024-05-10 06:01:46 | ERROR | stderr | | self._verify_args()
2024-05-10 06:01:46 | ERROR | stderr | | File "/home/dingjb/miniconda3/lib/python3.11/site-packages/vllm/sampling_params.py", line 148, in _verify_args
2024-05-10 06:01:46 | ERROR | stderr | | raise ValueError(
2024-05-10 06:01:46 | ERROR | stderr | | ValueError: max_tokens must be at least 1, got -763.
2024-05-10 06:01:46 | ERROR | stderr | +------------------------------------
2024-05-10 06:01:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:02:44 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:03:29 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:04:14 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.
2024-05-10 06:04:59 | INFO | model_worker | Send heart beat. Models: ['vicuna-13b-v1.5']. Semaphore: Semaphore(value=0, locked=True). call_ct: 1228. worker_id: 42c39e7a.