Python assertions not working with external.py

Question

Python assertions not working with external.py

romaintoub opened this issue 4 months ago · comments

Hello!
Very sorry to hammer you with these issues, the config with vertex is not really working because of... VertexAI so I'd rather use my own custom grader for now.

Initially I was using llm-rubric with a custom prompt which worked pretty well but I want to reuse this prompt with a python assertion now, something like a defaultTest I can define as a provider such as:

main(args):
    llm = VertexAI()
    prompt = "you are an evaluator..."
    chain = prompt | llm
    res = chain.invoke({"question": args[0], "target": args[1]})
    # res should be something like: {"pass": True, "score": 0.9, "reason": "blabla"}
    processed_res = process(res) # make sure to return a dictionary with pass, score, and reason
    return processed_res

I tried the function you provided in the docs, eg.

import json
import sys

def main():
    if len(sys.argv) >= 3:
        output = sys.argv[1]
        context = json.loads(sys.argv[2])
    else:
        raise ValueError("Model output and context are expected from promptfoo.")
    success = {'pass': True, 'score': 1.0, 'reason': 'this is btf'}
    return success

print(main())

and got this error:

do you know where this comes from? is there a workaround here to test the first function?

Ian Webster · Answer 1 · Thu Mar 28 2024 06:43:02 GMT+0800 (China Standard Time)

No need to apologize, sorry you've had trouble getting things up and running.

For the custom python assert, try wrapping it in print(json.dumps(main())). It's on my todo list to convert this to a nicer API, such as implementing a function call_assert that can return a native python object.

Side note, is 0.49.2 still causing problems for you with Vertex AI?

Romain · Answer 2 · Thu Mar 28 2024 06:51:52 GMT+0800 (China Standard Time)

yes I am still getting the same weird results from llm-rubric but I suspect VertexAI models and more specifically gemini-pro to be the reason why I have issues. This is why I'd like to bring my own model and run the pre-&post processing to make sure I have what I want

thanks for the feedback, will try the json.dumps()

Sparsh Srivastava · Answer 3 · Tue Apr 02 2024 15:51:34 GMT+0800 (China Standard Time)

@typpo I'm facing the same issue, when I'm using custom Python Assertion for my test cases as described in the docs - https://www.promptfoo.dev/docs/configuration/expected-outputs/python

I have confirmed using debug statement that json formed is correct, however I still get the same error as @romaintoub.
I am on the latest version of promptfoo i.e. v0.50.0 and am using json.dumps() to dump the dictionary.

Attaching the screenshot.

Ian Webster · Answer 4 · Wed Apr 03 2024 08:42:51 GMT+0800 (China Standard Time)

@reallyinvincible Hard to tell without more debug info. It looks like what's happening it that your assertion is outputting something other than JSON. Perhaps there are some other print statements/stdout elsewhere in the code?

Two next steps for me:

This PR will improve debugging issues like this in the future #620
I am going to switch python assertions to a native integration (similar to python providers) so that you can just return the value and not worrying about print statements and stdout...

Sparsh Srivastava · Answer 5 · Fri Apr 05 2024 13:59:08 GMT+0800 (China Standard Time)

@typpo That really sounds like a more elegant solution. Is it possible to get an ETA on the next release?

Ian Webster · Answer 6 · Fri Apr 05 2024 22:20:42 GMT+0800 (China Standard Time)

@reallyinvincible PR here #638, so likely today or tomorrow

Ian Webster · Answer 7 · Mon Apr 08 2024 00:33:51 GMT+0800 (China Standard Time)

Change is released in 0.51.0