confident-ai / deepeval

The LLM Evaluation Framework

Home Page:https://docs.confident-ai.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Selective records failure instead of Complete Job Failure

TheDominus opened this issue · comments

Is your feature request related to a problem? Please describe.
LLMs sometime cannot generate JSON output for certain records in a dataset. Deepeval functions breaks the evaluation if one such instance is received. Instead of breaking the job. Deepeval should ignore that record and evaluate the metrics for rest of the records and may be return the failure reasons record level instead of failing the job completely.

Describe the solution you'd like
Questions [1,2,3]
Contexts[1,2,3]
Responses[1,2,3]

Evalaute - Faithfullness

Output should be like this.

Output - [.76, Nan ("failed because LLM could not give proper Json"), .98]

Here, for 2nd record LLM did not work as expected and the response was Nan instead of not returning any data.

Describe alternatives you've considered
Ragas provide similar functionality.

Additional context
Add any other context or screenshots about the feature request here.

@TheDominus Do you explicitly want NaN or as your described just continue with the job and ignoring errors? If so we already have the option to ignore errors: https://docs.confident-ai.com/docs/evaluation-test-cases#evaluate-test-cases-in-bulk

or if you are using deepeval test run:
https://docs.confident-ai.com/docs/evaluation-introduction#ignore-errors

Thanks for this. This will help