[Feature] Add GPT Vision Models

Question

[Feature] Add GPT Vision Models

sebiweise opened this issue 8 months ago · comments

Sebastian Goscinski commented 8 months ago

Describe the bug
Add GPT 4 Vision https://platform.openai.com/docs/guides/vision

Jonathan Mohrbacher · Answer 1 · Mon Jan 22 2024 22:53:29 GMT+0800 (China Standard Time)

Any opinions on what this feature should look like?

I imagine that we agree that if the gpt-4-vision model is selected, we show an "upload images" icon on the left in the message bar.

But beyond that?

The API accepts both image URLs and base64 encoded images. Should we present this choice to the user?

The API permits that the image's detail be set to low/high/auto. Should we present this choice to the user?

If the user uploads an image (rather than provides a URL), should we use supabase storage (S3-equivalent) to hold onto it? The gpt-4-vision docs recommend ...

For long running conversations, we suggest passing images via URL's instead of base64. The latency of the model can also be improved by downsizing your images ahead of time to be less than the maximum size they are expected them to be. For low res mode, we expect a 512px x 512px image. For high res mode, the short side of the image should be less than 768px and the long side should be less than 2,000px.

Just glancing at supabase storage, it looks like it could offer both URLs and resizing, which would be advantageous in long running conversations.