Arize-ai / openinference

OpenTelemetry Instrumentation for AI Observability

Home Page:https://arize-ai.github.io/openinference/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🗺️ Vision / multi-modal

mikeldking opened this issue · comments

GPT 4o introduces a new message type that contains images and coded as either URL or base64 encoded.

example:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

https://platform.openai.com/docs/guides/vision

Milestone 1

  • Vision support in instrumentations python for llama-index, openai, gemini, and langchain
  • Eliminate performance degradations from base64 encoded payloads by allowing users to opt out
  • Preliminary set of config flags to mask input output that could be sensitive info
  • Create examples

Milestone N

  • image synthesis apis such as DALL-E

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals

Example vLLM client that should also support vision

class VLMClient:
    def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL):
        self._vlm_model = vlm_model
        self._vllm_client = httpx.AsyncClient(base_url=vllm_url)

        if VLLM_HEALTHCHECK:
            wait_for_ready(
                server_url=vllm_url,
                wait_seconds=VLLM_READY_TIMEOUT,
                health_endpoint="health",
            )

    @property
    def vlm_model(self) -> str:
        return self._vlm_model

    async def __call__(
        self,
        prompt: str,
        image_bytes: bytes | None = None,
        image_filetype: filetype.Type | None = None,
        max_tokens: int = 10,
    ) -> str:
        # Assemble the message content
        message_content: list[dict[str, str | dict]] = [
            {
                "type": "text",
                "text": prompt,
            }
        ]

        if image_bytes is not None:
            if image_filetype is None:
                image_filetype = filetype.guess(image_bytes)

            if image_filetype is None:
                raise ValueError("Could not determine image filetype")

            if image_filetype not in ALLOWED_IMAGE_TYPES:
                raise ValueError(
                    f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}"
                )

            image_b64 = base64.b64encode(image_bytes).decode("utf-8")
            message_content.append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{image_filetype.mime};base64,{image_b64}",
                    },
                }
            )

        # Put together the request payload
        payload = {
            "model": self.vlm_model,
            "messages": [{"role": "user", "content": message_content}],
            "max_tokens": max_tokens,
            # "logprobs": True,
            # "top_logprobs": 1,
        }

        response = await self._vllm_client.post("/v1/chat/completions", json=payload)
        response = response.json()
        response_text: str = (
            response.get("choices")[0].get("message", {}).get("content", "").strip()
        )

        return response_text