Selective records failure instead of Complete Job Failure

Question

Selective records failure instead of Complete Job Failure

TheDominus opened this issue 2 months ago · comments

Is your feature request related to a problem? Please describe.
LLMs sometime cannot generate JSON output for certain records in a dataset. Deepeval functions breaks the evaluation if one such instance is received. Instead of breaking the job. Deepeval should ignore that record and evaluate the metrics for rest of the records and may be return the failure reasons record level instead of failing the job completely.

Describe the solution you'd like
Questions [1,2,3]
Contexts[1,2,3]
Responses[1,2,3]

Evalaute - Faithfullness

Output should be like this.

Output - [.76, Nan ("failed because LLM could not give proper Json"), .98]

Here, for 2nd record LLM did not work as expected and the response was Nan instead of not returning any data.

Describe alternatives you've considered
Ragas provide similar functionality.

Additional context
Add any other context or screenshots about the feature request here.

Jeffrey Ip · Answer 1 · Mon Jun 03 2024 22:28:12 GMT+0800 (China Standard Time)

@TheDominus Do you explicitly want NaN or as your described just continue with the job and ignoring errors? If so we already have the option to ignore errors: https://docs.confident-ai.com/docs/evaluation-test-cases#evaluate-test-cases-in-bulk

or if you are using deepeval test run:
https://docs.confident-ai.com/docs/evaluation-introduction#ignore-errors

TheDominus · Answer 2 · Tue Jun 11 2024 00:53:18 GMT+0800 (China Standard Time)

Thanks for this. This will help