Docs request: where does histogram come from?
jamesbraza opened this issue · comments
I have three possible scores: 0, 0.1, and 1 for a Python assertion, and two basic assertions.
providers:
- openai:chat:gpt-4-0613
- openai:chat:gpt-4-turbo-2024-04-09
- anthropic:messages:claude-3-sonnet-20240229
defaultTest:
assert:
- description: was answered
type: not-icontains
value: cannot answer
- description: has sentences
type: javascript
value: output.length > 20
- description: check value
type: python
value: file://assert.py
At the top of my promptfoo view
, I see bins around 0.6 and 0.7, which isn't quite making sense to me:
The request is, can we add a little description such that this figure is easy to understand.
- I have three different model providers, is that where Prompt 1 (red), Prompt 2 (blue), and Prompt 3 (green) come from?
- Why does the histogram show scores of
0.6
and0.7
? Is that like a sum of multiple assertions' scores?
I now understand that I have three assertions:
- Two binary ones: can be score 0 or 1
- One custom assertion: can be score 0, 0.1, 1
I realized the histogram plots mean score: 0.7 = (1 + 1 + 0.1) / 3
That being said, I still think perhaps promptfoo
can add a little info bubble or hover-over/tooltip that explains this.
Feel free to close this out if uninterested