mzbac / mlx-llm-server

For inferring and serving local LLMs using the MLX framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

code 501, message Unsupported method ('GET')

cleesmith opened this issue · comments

pip install mlx-llm-server

Works fine via curl.

But some apps use a request to get a list of available models. The same as OpenAI's API does.
This causes an issue and most apps respond with a message similar to:
No Models found
This could mean that the connection is not configured correctly or that the vendor did not return any models.
If applicable, make sure that CORS is enabled on the vendor's side.

mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"
Fetching 11 files: 100%|██████████████████████████████████████████| 11/11 [00:00<00:00, 225060.21it/s]
Starting httpd at 127.0.0.1 on port 8080...
127.0.0.1 - - [01/Mar/2024 15:50:45] "POST /v1/chat/completions HTTP/1.1" 200 -
127.0.0.1 - - [01/Mar/2024 15:52:09] "POST /v1/chat/completions HTTP/1.1" 200 -
127.0.0.1 - - [01/Mar/2024 15:56:01] "OPTIONS /api/tags HTTP/1.1" 204 -
127.0.0.1 - - [01/Mar/2024 15:56:01] code 501, message Unsupported method ('GET')
127.0.0.1 - - [01/Mar/2024 15:56:01] "GET /api/tags HTTP/1.1" 501 -

Thoughts?
Thanks

It looks like that the app you used requires the GET endpoint of /api/tags, but I couldn't find any API specifications on OpenAI's API for that endpoint. So, I suspect it is something specific to the app.

I could not find that endpoint in OpenAI's API docs either. But this is a common request that I see in the logs for Ollama and LM Studio, but after the apps I using have a value from that GET request it's just used as a label for the model in use.
After that, all of the requests are to POST /v1/chat/completions endpoint.

Perhaps I can clone this repo just to handle this GET request and return any/some model name.

My use case is mostly for NovelCrafter and a few RAG setups, which connect to a local chat/inference model such as Mistral or Mixtral or Westlake. This saves me money instead of using OpenAI API or Claude, which gets expensive.

I do want to use my new macbook more and with MLX ... so thanks for your work.

hey there, just to weigh in on the conversation, since this is an issue caused by my app: the tags endpoint is specific to ollama, not openai. However, is there a way to add support for the OpenAI models endpoint? My app relies on the server returning a list of the model(s) people can call. even if it's just the one being loaded on start, that's fine.

I think it should be easy to add models list endpoint https://platform.openai.com/docs/api-reference/models/list

By me cloning the repo or will you be adding this? Either way is fine, thanks.

Yes, I can add it if I have some time tomorrow.

I copied the inaccurate logs before, it should have been the openai and not ollama, like this:

mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"

Fetching 11 files: 100%|███████████████████████████████████| 11/11 [00:00<00:00, 56471.66it/s]
Starting httpd at 127.0.0.1 on port 8080...
127.0.0.1 - - [02/Mar/2024 09:11:26] "OPTIONS /v1/models HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 09:11:26] code 501, message Unsupported method ('GET')
127.0.0.1 - - [02/Mar/2024 09:11:26] "GET /v1/models HTTP/1.1" 501 -

... this GET, as you guys pointed out, is in the OpenAI docs/specs.
Thanks for doing this, let me know as I will be glad to test it out too.

I copied the inaccurate logs before, it should have been the openai and not ollama, like this:

mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"

Fetching 11 files: 100%|███████████████████████████████████| 11/11 [00:00<00:00, 56471.66it/s] Starting httpd at 127.0.0.1 on port 8080... 127.0.0.1 - - [02/Mar/2024 09:11:26] "OPTIONS /v1/models HTTP/1.1" 204 - 127.0.0.1 - - [02/Mar/2024 09:11:26] code 501, message Unsupported method ('GET') 127.0.0.1 - - [02/Mar/2024 09:11:26] "GET /v1/models HTTP/1.1" 501 -

... this GET, as you guys pointed out, is in the OpenAI docs/specs. Thanks for doing this, let me know as I will be glad to test it out too.

Would you try running pip install -U mlx-llm-server to see if the updated version fixes the issue?

It did fix the GET request and it now returns the correct model id/name, but then this happens:

mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"

Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<00:00, 193043.28it/s]
Starting httpd at 127.0.0.1 on port 8080...
127.0.0.1 - - [02/Mar/2024 16:09:10] "OPTIONS /v1/models HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "GET /v1/models HTTP/1.1" 200 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "OPTIONS /v1/models HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "GET /v1/models HTTP/1.1" 200 -
127.0.0.1 - - [02/Mar/2024 16:11:23] "OPTIONS /v1/chat/completions HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:11:23] "POST /v1/chat/completions HTTP/1.1" 200 -
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 55758)
Traceback (most recent call last):
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
 self.process_request(request, client_address)
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 347, in process_request
 self.finish_request(request, client_address)
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 360, in finish_request
 self.RequestHandlerClass(request, client_address, self)
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 747, in __init__
 self.handle()
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/http/server.py", line 433, in handle
 self.handle_one_request()
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/http/server.py", line 421, in handle_one_request
 method()
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/mlx_llm_server/app.py", line 206, in do_POST
 response = self.handle_post_request(post_data)
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/mlx_llm_server/app.py", line 217, in handle_post_request
 prompt = _tokenizer.apply_chat_template(
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1742, in apply_chat_template
 rendered = compiled_template.render(
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
 self.environment.handle_exception()
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
 raise rewrite_traceback_stack(source=source)
 File "<template>", line 1, in top-level template code
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
 return __context.call(__obj, *args, **kwargs)
 File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1776, in raise_exception
 raise TemplateError(message)
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
----------------------------------------

Perhaps this is related to the app novelcrafter, so more research is needed, as this still works properly:

curl http://localhost:8080/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer no-key" \
 -d '{
 "model": "mistral",
 "stop":["<|im_end|>"],
 "messages": [
 {
 "role": "user",
 "content": "hi, who are you?"
 }
 ]
}'
{"id": "chatcmpl-33dbad95-d0b4-4a53-b1b4-5a41915ca421", "object": "chat.completion", "created": 1709414336, "model": "mistral", "system_fingerprint": "fp_e721a08c-73ce-40f8-913e-2417e204f144", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello! I'm an AI language model, designed to help answer questions and assist with various tasks. I don't have a physical form or identity, but I'm here to help you in any way I can. How can I assist you today?</s>"}, "logprobs": null, "finish_reason": null}], "usage": {"prompt_tokens": 14, "completion_tokens": 54, "total_tokens": 68}}% 

But thank you for your efforts, and here's hoping the added GET will be helpful to others.

The error occurred because the Mistral chat template doesn't support system prompts. However, this error shouldn't cause the request to fail; it should just be a warning. If you try another model that supports system prompts, the error will disappear.