simonw / llm

Access large language models from the command-line

Home Page:https://llm.datasette.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrade for compatibility with OpenAI 1.0 library

simonw opened this issue · comments

Currently:

Successfully installed openai-1.0.1
$ llm -m gpt-4-turbo 'hi'
Error: module 'openai' has no attribute 'ChatCompletion'

openai/openai-python#631 has upgrade instructions.

You can run:

openai migrate

And it uses Grit and some transforms to try and upgrade your code.

Here's the diff it generated:

diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index ee88695..31d6a3f 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -4,6 +4,9 @@ from llm.utils import dicts_to_table_string
 import click
 import datetime
 import openai
+from openai import OpenAI
+
+client = OpenAI()
 import os
 
 try:
@@ -22,7 +25,9 @@ if os.environ.get("LLM_OPENAI_SHOW_RESPONSES"):
         click.echo(response.text, err=True)
         return response
 
-    openai.requestssession = requests.Session()
+    raise Exception(
+        "The 'openai.requestssession' option isn't read in the client API. You will need to pass it when you instantiate the client, e.g. 'OpenAI(requestssession=requests.Session())'"
+    )
     openai.requestssession.hooks["response"].append(log_response)
 
 
@@ -88,7 +93,7 @@ class Ada002(EmbeddingModel):
     batch_size = 100  # Maybe this should be 2048
 
     def embed_batch(self, items: Iterable[Union[str, bytes]]) -> Iterator[List[float]]:
-        results = openai.Embedding.create(
+        results = client.embeddings.create(
             input=items, model="text-embedding-ada-002", api_key=self.get_key()
         )["data"]
         return ([float(r) for r in result["embedding"]] for result in results)
@@ -202,7 +207,8 @@ class Chat(Model):
             default=None,
         )
         seed: Optional[int] = Field(
-            description="Integer seed to attempt to sample deterministically", default=None
+            description="Integer seed to attempt to sample deterministically",
+            default=None,
         )
 
         @field_validator("logit_bias")
@@ -276,7 +282,7 @@ class Chat(Model):
         response._prompt_json = {"messages": messages}
         kwargs = self.build_kwargs(prompt)
         if stream:
-            completion = openai.ChatCompletion.create(
+            completion = client.chat.completions.create(
                 model=self.model_name or self.model_id,
                 messages=messages,
                 stream=True,
@@ -290,7 +296,7 @@ class Chat(Model):
                     yield content
             response.response_json = combine_chunks(chunks)
         else:
-            completion = openai.ChatCompletion.create(
+            completion = client.chat.completions.create(
                 model=self.model_name or self.model_id,
                 messages=messages,
                 stream=False,
@@ -352,7 +358,7 @@ class Completion(Chat):
         response._prompt_json = {"messages": messages}
         kwargs = self.build_kwargs(prompt)
         if stream:
-            completion = openai.Completion.create(
+            completion = client.completions.create(
                 model=self.model_name or self.model_id,
                 prompt="\n".join(messages),
                 stream=True,
@@ -366,7 +372,7 @@ class Completion(Chat):
                     yield content
             response.response_json = combine_chunks(chunks)
         else:
-            completion = openai.Completion.create(
+            completion = client.completions.create(
                 model=self.model_name or self.model_id,
                 prompt="\n".join(messages),
                 stream=False,

I'm nervous that this upgrade might break compatibility with other servers that imitate the OpenAI API - see https://llm.datasette.io/en/stable/other-models.html#openai-compatible-models

Any news on this? It's getting tougher and tougher to use several openai version at the same time :)

I've used OpenAI Python library 1.0+ with several different API proxies. Tried with both "legacy" direct calls via response=openai.chat.completions.create() and the client = OpenAI() methods. Backwards compatibility seems to be fine, I haven't found myself in trouble. The legacy method is a very straightforward code change.

The custom endpoints are set up slightly differently between the two new options. See below.

"Legacy" but still new in 1.0+:

openai.api_key="<key>"
openai.base_url="<base_url>" # this includes the /v1/ slug
openai.default_headers={
  	"header": "<key>",
	...
}

followed by a near-similar call from the past:

response=openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {
        "role": "system",
        "content": f"{system_prompt}"
        },
        {
        "role": "user",
        "content": f"{message}"
        },
    ],
    temperature=0.5,
    max_tokens=512,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stream=False,
)
print(response.choices[0].message.content)

Or the all-new way:

client=OpenAI(
    api_key="<key>",
    base_url="<base_url>",
    http_client=httpx.Client(
        headers={
            "header": "<key>",
			...
        } 
    )
)

followed by:

response=client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {
        "role": "system",
        "content": f"{system_prompt}"
        },
        {
        "role": "user",
        "content": f"{message}"
        },
    ],
    temperature=0.5,
    max_tokens=512,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stream=False,
)
print(response.choices[0].message.content)

Hope this helps in your code conversion.

Maybe a simpler way would be simply to use litellm.

They make a python package that can be used to call lots of different backends. I think it would be better for llm to use litellm to handle API call than to keep relying on people creating and maintaining plugins for every backend.

They also support caching, embeddings, async calls, etc.

I've started using litellm proxies in combination with llm, the combination is very promising (but doesn't always work; see: #377).

It would be great if the llm docs made it more obvious how to use litellm; it's actually very easy to do this once you know how, just add something like this to your extra-openai-models.yaml

- model_name: ollama/mixtral
  model_id: litellm-mixtral
  api_base: "http://0.0.0.0:8000"

it's a bit annoying if you want to expose a lot of models this way, some kind of pattern based passthrough would be nice

I wonder if I could get this to work with BOTH versions of the OpenAI library? That way llm won't break if someone installs it alongside something that requires the previous library version.

I've already done this for Pydantic, and I've started doing it in some of my other projects for Datasette itself, e.g. in https://github.com/simonw/datasette-cluster-map/blob/0a7e14528ba60dc059e88b5ea0bd7d57f206382f/.github/workflows/test.yml#L5-L11

I've been worrying too much about if I can be sure the YAML configured OpenAI extensions will definitely work.

Instead, I think I should do the upgrade such that both dependency versions work... and tell people that if they run into trouble with it they should install openai<1.0 to fix their problem - and file bugs for me to look into a better fix.

Moving this work to a PR:

OK, I've landed this change. I plan to ship it as a release tomorrow, but it could definitely benefit from a few people other than me kicking the tires on it!

You can run the latest development version like this:

python -m venv /tmp/llm-venv
/tmp/llm-venv/bin/pip install https://codeload.github.com/simonw/llm/zip/refs/heads/main
/tmp/llm-venv/bin/llm 'say hello'