Docs request: where does histogram come from?

Question

Docs request: where does histogram come from?

jamesbraza opened this issue 3 months ago · comments

I have three possible scores: 0, 0.1, and 1 for a Python assertion, and two basic assertions.

providers:
  - openai:chat:gpt-4-0613
  - openai:chat:gpt-4-turbo-2024-04-09
  - anthropic:messages:claude-3-sonnet-20240229
defaultTest:
  assert:
    - description: was answered
      type: not-icontains
      value: cannot answer
    - description: has sentences
      type: javascript
      value: output.length > 20
    - description: check value
      type: python
      value: file://assert.py

At the top of my promptfoo view, I see bins around 0.6 and 0.7, which isn't quite making sense to me:

The request is, can we add a little description such that this figure is easy to understand.

I have three different model providers, is that where Prompt 1 (red), Prompt 2 (blue), and Prompt 3 (green) come from?
Why does the histogram show scores of 0.6 and 0.7? Is that like a sum of multiple assertions' scores?

James Braza · Answer 1 · Wed May 01 2024 02:18:00 GMT+0800 (China Standard Time)

I now understand that I have three assertions:

Two binary ones: can be score 0 or 1
One custom assertion: can be score 0, 0.1, 1

I realized the histogram plots mean score: 0.7 = (1 + 1 + 0.1) / 3

That being said, I still think perhaps promptfoo can add a little info bubble or hover-over/tooltip that explains this.

Feel free to close this out if uninterested