llm-rubric with VertexAI Gemini Pro
romaintoub opened this issue · comments
Hey @typpo
I've make it work with the python assertion but it's working 80% of the time as its using stdout from the main function, it is very sensible to any error or bugs so I want to go back to the first option that was working initially and I think I have some good info on why its not working as expected. My config:
# The LLM output will be graded by gemini-pro
defaultTest:
options:
provider: vertex:gemini-pro
config:
temperature: 0
maxOutputTokens: 1024
rubricPrompt:
- role: system
content: >-
You are evaluating the answer...
and i ran in debug mode I found this:
Cache is disabled.
Computing file hash for script evaluator/providers/eval_llm_chain.py
Running python script evaluator/providers/eval_llm_chain.py with scriptPath providers/eval_llm_chain.py and args: Why the sky is blue?
[object Object]
[object Object]
Python script evaluator/providers/eval_llm_chain.py returned: Importing module eval_llm_chain from evaluator/providers ...
{"type": "final_result", "data": {"output": "I can't answer that question."}}
Coerced JSON prompt to Gemini format: [{"role":"system","content":"You are evaluating the answer of an assistant.\n ## Your turn!\n Target: Sunlight reaches Earth's atmosphere and is scattered in all directions by all the gases and particles in the air. Blue light is scattered more than the other colors because it travels as shorter, smaller waves. This is why we see a blue sky most of the time.\n Assistant: I can't answer that question."},{"role":"user","content":"Output: I can't answer that question."}]
Preparing to call Google Vertex API (Gemini) with body: {"contents":{"role":"user","parts":{"text":"You are evaluating the answer of an assistant.\n same thing "}},"generationConfig":{}}
Gemini API response:
**
[
{"candidates":[{"content":{"role":"model","parts":[{"text":"{"}]}}]},
{"candidates":[{"content":{"role":"model","parts":[{"text":" \"pass\": false, \"score\": 0.0, \"reason\": \"The target response should not talk about off-topic discussions"}]},"safetyRatings":[{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE","probabilityScore":0.24095316,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.14977993},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE","probabilityScore":0.091220066,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.13139598},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE","probabilityScore":0.25870034,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.15140383},{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE","probabilityScore":0.091220066,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.07544843}]}]},
{"candidates":[{"content":{"role":"model","parts":[{"text":".\"} similar answer to the target response.\"\n}"}]},"finishReason":"STOP","safetyRatings":[{"category":"HARM_CATEGORY_HATE_SPEECH","probability":"NEGLIGIBLE","probabilityScore":0.20291664,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.12656909},{"category":"HARM_CATEGORY_DANGEROUS_CONTENT","probability":"NEGLIGIBLE","probabilityScore":0.058024753,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.13523208},{"category":"HARM_CATEGORY_HARASSMENT","probability":"NEGLIGIBLE","probabilityScore":0.24220563,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.16013464},{"category":"HARM_CATEGORY_SEXUALLY_EXPLICIT","probability":"NEGLIGIBLE","probabilityScore":0.05623635,"severity":"HARM_SEVERITY_NEGLIGIBLE","severityScore":0.06816437}]}],
"usageMetadata":{"promptTokenCount":482,"candidatesTokenCount":40,"totalTokenCount":522}}
]
**
Eval #1 complete (1 of 1)
And the error I have in prompt view
is:
llm-rubric produced malformed response: { "pass": false, "score": 0.0, "reason": "The target response should not talk about off-topic discussions."} similar answer to the target response."}
It looks like the response given from gemini is broken down in 3 different parts and the last part is not necessary and is causing the parsing issue as its adding some text after the dictionary
do you know if this could come from the VertexAI API and be fixed here?
Also, should the gemini config be integrated into the generationConfig
attribute in the body? Normally I use gemini-1.0-pro with temperature = 0
Just pushed two changes that should help address this problem.
-
Try changing your config to:
config: generationConfig: temperature: 0
Google nests their
temperature
and similar under thegenerationConfig
key. I may make a change to automatically correct this (because I'm sure you're not the only one), but as a temporary workaround try addinggenerationConfig
. -
Updated
llm-rubric
to be more resilient to JSON responses with additional text before or after.
thanks! I will test it out
it works now, thanks a lot!