Inference on Trained Mistral-7b fails often
aliasaria opened this issue · comments
log attached
INFO: 127.0.0.1:59469 - "GET /server/worker_start?model_name=TransformerLab-mlx/MLX-Mistral-7B-Instruct-v0.2-1709360046_rivet&model_filename=/Users/timk/.transformerlab/workspace/models/MLX-Mistral-7B-Instruct-v0.2-1709360046_rivet&adaptor=&engine=mlx_server&experiment_id=1¶meters={%22inferenceEngine%22:%22mlx_server%22} HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /model/get_conversation_template?model=TransformerLab-mlx/MLX-Mistral-7B-Instruct-v0.2-1709360046_rivet HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /model/get_conversation_template?model=MLX-Mistral-7B-Instruct-v0.2-1709360046_rivet HTTP/1.1" 200 OK
INFO: 127.0.0.1:59401 - "GET /experiment/1/get_conversations HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "POST /v1/chat/count_tokens HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59469 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "POST /v1/chat/count_tokens HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 277, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 273, in wrap
await func()
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 250, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 538, in receive
await self.message_event.wait()
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/asyncio/locks.py", line 213, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 2a5487510
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/fastapi/applications.py", line 289, in __call__
| await super().__call__(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
| raise exc
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
| await self.app(scope, receive, _send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
| raise exc
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
| await self.app(scope, receive, sender)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
| raise e
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
| await self.app(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
| await route.handle(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
| await self.app(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/routing.py", line 69, in app
| await response(scope, receive, send)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 270, in __call__
| async with anyio.create_task_group() as task_group:
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 664, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
| yield
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/http11.py", line 209, in _receive_event
| event = self._h11_state.next_event()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/h11/_connection.py", line 469, in next_event
| event = self._extract_next_receive_event()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/h11/_connection.py", line 419, in _extract_next_receive_event
| event = self._reader.read_eof() # type: ignore[attr-defined]
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/h11/_readers.py", line 204, in read_eof
| raise RemoteProtocolError(
| h11._util.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_transports/default.py", line 66, in map_httpcore_exceptions
| yield
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_transports/default.py", line 249, in __aiter__
| async for part in self._httpcore_stream:
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 361, in __aiter__
| async for part in self._stream:
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/http11.py", line 337, in __aiter__
| raise exc
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/http11.py", line 329, in __aiter__
| async for chunk in self._connection._receive_response_body(**kwargs):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/http11.py", line 198, in _receive_response_body
| event = await self._receive_event(timeout=timeout)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_async/http11.py", line 208, in _receive_event
| with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/contextlib.py", line 158, in __exit__
| self.gen.throw(typ, value, traceback)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
| raise to_exc(exc) from exc
| httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 273, in wrap
| await func()
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/starlette/responses.py", line 262, in stream_response
| async for chunk in self.body_iterator:
| File "/Users/timk/.transformerlab/src/transformerlab/fastchat_openai_api.py", line 435, in chat_completion_stream_generator
| async for content in generate_completion_stream(gen_params):
| File "/Users/timk/.transformerlab/src/transformerlab/fastchat_openai_api.py", line 591, in generate_completion_stream
| async for raw_chunk in response.aiter_raw():
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_models.py", line 990, in aiter_raw
| async for raw_stream_bytes in self.stream:
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_client.py", line 146, in __aiter__
| async for chunk in self._stream:
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_transports/default.py", line 248, in __aiter__
| with map_httpcore_exceptions():
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/contextlib.py", line 158, in __exit__
| self.gen.throw(typ, value, traceback)
| File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/site-packages/httpx/_transports/default.py", line 83, in map_httpcore_exceptions
| raise mapped_exc(message) from exc
| httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
+------------------------------------
INFO: 127.0.0.1:59514 - "POST /api/v1/token_check HTTP/1.1" 400 Bad Request
INFO: 127.0.0.1:59514 - "POST /experiment/1/save_conversation HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /experiment/1/get_conversations HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "POST /v1/chat/count_tokens HTTP/1.1" 400 Bad Request
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59555 - "GET /server/info HTTP/1.1" 200 OK
INFO: 127.0.0.1:59514 - "GET /server/worker_healthz HTTP/1.1" 200 OK
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [28131]
FASTAPI LIFESPAN: Complete
Exception ignored in atexit callback: <function cleanup_at_exit at 0x29ca38040>
Traceback (most recent call last):
File "/Users/timk/.transformerlab/src/api.py", line 321, in cleanup_at_exit
worker_process.kill()
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/asyncio/subprocess.py", line 146, in kill
self._transport.kill()
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/asyncio/base_subprocess.py", line 153, in kill
self._check_proc()
File "/Users/timk/miniconda3/envs/transformerlab/lib/python3.11/asyncio/base_subprocess.py", line 142, in _check_proc
raise ProcessLookupError()
ProcessLookupError:
🔴 Quitting spawned controller.
🔴 Quitting spawned workers.