jxnl / instructor

structured outputs for llms

Home Page:https://python.useinstructor.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TOOLS mode fails on retries

samgregson opened this issue · comments

  • This is actually a bug report.
  • I am not getting good LLM Results
  • I have tried asking for help in the community on discord or discussions and have not received a response.
  • I have tried searching the documentation and have not found an answer.

What Model are you using?

  • gpt-3.5-turbo
  • gpt-4-turbo
  • gpt-4
  • Other (please specify)

Describe the bug
Retries in default TOOLS mode returns null

To Reproduce
I have used an example from the docs but updated to pydantic v2:

client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

class UserDetail(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be uppercase")
        return name

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract `jason is 12`"},
    ],
    max_retries=2,
)

print(response.model_dump_json(indent=2))

returns:
OpenAI returns null and this raises an error of Invalid JSON

Expected behavior
When using JSON mode, things work as expected and I get

{
    "name": "JASON",
    "age": 12
}

Screenshots
Taken from Langsmith tracing:

TOOLS mode:
image

JSON mode:
image

Interesting, I wonder if GPT-3.5 is regressing.

can you try

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be uppercase, please correct this")
        return name

i.e., just prompting the error message to be a little bit more explicit in fixing the response.

Nope, sorry, I tried it with a bunch of prompts and a bunch of models, even this:

class UserDetail(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be UPPERCASE. Use the tool again but modify the name argument to 'JASON'.")
        return name

response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    response_model=UserDetail,
    temperature=0.01,
    messages=[
        {"role": "user", "content": "Extract `jason is 12`"},
    ],
    max_retries=2,
)

I might try making the instruction come from a "user" rather than a "tool" role, but use TOOL mode.

What are the down sides to JSON mode btw? If this works better for me?

if it works it works!

ok maybe theres something deeper going on rightn ow

OK a small experiment suggests that passing the validation error back manually as a "user" role works better, when still using TOOLS mode.

    response = None
    max_retries = 2
    for i in range(max_retries):
        try:
            print(messages)
            response = await client.chat.completions.create(
                model=model,
                temperature=temperature,
                messages=messages.copy(),
                response_model=response_model,
                max_retries=0
            )
            break
        except InstructorRetryException as e:
            completion: ChatCompletion = e.last_completion
            if client.mode == instructor.Mode.TOOLS:
                response_json = completion.choices[0].message.tool_calls[0].function.arguments
            else:
                response_json = completion.choices[0].message.content
            messages.append({
                "role": "assistant",
                "content": response_json
            })
            messages.append({
                "role": "user",
                "content": str(e)
            })
    if response is None:
        response = response_model.model_validate_json(response_json)

    return response

if you have a moment can you check if our message in retry.py is use or assistant? im surprised maybe we need to improve the prompts.

For mode.TOOLS you use the tool role to pass back the validation error. This is the obvious choice I think.

I wonder if the models are fine tuned to just summarise the function output rather than expecting errors.