TOOLS mode fails on retries

Question

TOOLS mode fails on retries

samgregson opened this issue 2 months ago · comments

Sam Gregson commented 2 months ago

This is actually a bug report.
I am not getting good LLM Results
I have tried asking for help in the community on discord or discussions and have not received a response.
I have tried searching the documentation and have not found an answer.

What Model are you using?

gpt-3.5-turbo
gpt-4-turbo
gpt-4
Other (please specify)

Describe the bug
Retries in default TOOLS mode returns null

To Reproduce
I have used an example from the docs but updated to pydantic v2:

client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS)

class UserDetail(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be uppercase")
        return name

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract `jason is 12`"},
    ],
    max_retries=2,
)

print(response.model_dump_json(indent=2))

returns:
OpenAI returns null and this raises an error of Invalid JSON

Expected behavior
When using JSON mode, things work as expected and I get

{
    "name": "JASON",
    "age": 12
}

Screenshots
Taken from Langsmith tracing:

TOOLS mode:

JSON mode:

Jason Liu · Answer 1 · Thu May 30 2024 21:46:00 GMT+0800 (China Standard Time)

Interesting, I wonder if GPT-3.5 is regressing.

can you try

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be uppercase, please correct this")
        return name

i.e., just prompting the error message to be a little bit more explicit in fixing the response.

Sam Gregson · Answer 2 · Thu May 30 2024 22:50:55 GMT+0800 (China Standard Time)

Nope, sorry, I tried it with a bunch of prompts and a bunch of models, even this:

class UserDetail(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def name_must_be_uppercase(cls, name):
        if name.islower():
            raise ValueError("name must be UPPERCASE. Use the tool again but modify the name argument to 'JASON'.")
        return name

response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    response_model=UserDetail,
    temperature=0.01,
    messages=[
        {"role": "user", "content": "Extract `jason is 12`"},
    ],
    max_retries=2,
)

I might try making the instruction come from a "user" rather than a "tool" role, but use TOOL mode.

What are the down sides to JSON mode btw? If this works better for me?

Jason Liu · Answer 3 · Fri May 31 2024 00:07:51 GMT+0800 (China Standard Time)

if it works it works!

ok maybe theres something deeper going on rightn ow

Sam Gregson · Answer 4 · Fri May 31 2024 00:44:26 GMT+0800 (China Standard Time)

OK a small experiment suggests that passing the validation error back manually as a "user" role works better, when still using TOOLS mode.

    response = None
    max_retries = 2
    for i in range(max_retries):
        try:
            print(messages)
            response = await client.chat.completions.create(
                model=model,
                temperature=temperature,
                messages=messages.copy(),
                response_model=response_model,
                max_retries=0
            )
            break
        except InstructorRetryException as e:
            completion: ChatCompletion = e.last_completion
            if client.mode == instructor.Mode.TOOLS:
                response_json = completion.choices[0].message.tool_calls[0].function.arguments
            else:
                response_json = completion.choices[0].message.content
            messages.append({
                "role": "assistant",
                "content": response_json
            })
            messages.append({
                "role": "user",
                "content": str(e)
            })
    if response is None:
        response = response_model.model_validate_json(response_json)

    return response

Jason Liu · Answer 5 · Fri May 31 2024 03:26:11 GMT+0800 (China Standard Time)

if you have a moment can you check if our message in retry.py is use or assistant? im surprised maybe we need to improve the prompts.

Sam Gregson · Answer 6 · Sun Jun 02 2024 06:01:21 GMT+0800 (China Standard Time)

For mode.TOOLS you use the tool role to pass back the validation error. This is the obvious choice I think.

I wonder if the models are fine tuned to just summarise the function output rather than expecting errors.