Selective records failure instead of Complete Job Failure
TheDominus opened this issue · comments
Is your feature request related to a problem? Please describe.
LLMs sometime cannot generate JSON output for certain records in a dataset. Deepeval functions breaks the evaluation if one such instance is received. Instead of breaking the job. Deepeval should ignore that record and evaluate the metrics for rest of the records and may be return the failure reasons record level instead of failing the job completely.
Describe the solution you'd like
Questions [1,2,3]
Contexts[1,2,3]
Responses[1,2,3]
Evalaute - Faithfullness
Output should be like this.
Output - [.76, Nan ("failed because LLM could not give proper Json"), .98]
Here, for 2nd record LLM did not work as expected and the response was Nan instead of not returning any data.
Describe alternatives you've considered
Ragas provide similar functionality.
Additional context
Add any other context or screenshots about the feature request here.
@TheDominus Do you explicitly want NaN or as your described just continue with the job and ignoring errors? If so we already have the option to ignore errors: https://docs.confident-ai.com/docs/evaluation-test-cases#evaluate-test-cases-in-bulk
or if you are using deepeval test run
:
https://docs.confident-ai.com/docs/evaluation-introduction#ignore-errors
Thanks for this. This will help