ollama / ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Home Page:https://ollama.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support image with url when chat with vison model

dickens88 opened this issue · comments

Hi thanks the Ollama team that made this helpful tool.

recently, chatgpt-4o api also support vision mode. they provide two method for uploading an image to the chat. one is encode the image to base64 that Ollama also use the same way. And another way is with url. with the url, chatgpt chat platform can automatically download the image and add to the chat.

Here is some descriptions about this functions in offical API docs

Managing images
The Chat Completions API, unlike the Assistants API, is not stateful. That means you have to manage the messages (including images) you pass to the model yourself. If you want to pass the same image to the model multiple times, you will have to pass the image each time you make a request to the API.
For long running conversations, we suggest passing images via URL's instead of base64. The latency of the model can also be improved by downsizing your images ahead of time to be less than the maximum size they are expected them to be. For low res mode, we expect a 512px x 512px image. For high res mode, the short side of the image should be less than 768px and the long side should be less than 2,000px.
After an image has been processed by the model, it is deleted from OpenAI servers and not retained. We do not use data uploaded via the OpenAI API to train our models.

Is it possible to support this way in Ollama?

There's a PR for it - #2506, but no update has been done since it was opened in February.

There's a PR for it - #2506, but no update has been done since it was opened in February.

I'm not going to update the PR until the ollama team acknowledge it. Not wasting my time.