Automated reporting script?

Question

Automated reporting script?

AngainorDev opened this issue 2 years ago · comments

Angainor Development commented 2 years ago

I'm a bit confused by the metric the test reports (score and number of wrong answers)
vs what is in the comparative table (different metrics)

Is there a ready made script to parse the test outputs and create the various metrics?

adtreat · Answer 1 · Fri Mar 31 2023 07:27:49 GMT+0800 (China Standard Time)

The score is output at the end of the run. And you see the total score in the file. To calculate the % just take 'total_score / number_of_questions * 2)' The number of correct answers and the number of uncertain answers are also in the files. That's how you reconcile with the comparative table.

Angainor Development · Answer 2 · Fri Mar 31 2023 15:10:14 GMT+0800 (China Standard Time)

Oh, my bad.
I had to check the code to see they were there indeed.

I was hoping for metrics in the header, rather than interleaved with the list of incorrect/idk questions.

Neither the number of correct answers nor the dataset len is in the output file btw.
I'll edit my fork before running more tests for clarity, thanks!

adtreat · Answer 3 · Fri Mar 31 2023 22:16:58 GMT+0800 (China Standard Time)

https://github.com/manyoso/haltt4llm/blob/main/results/test_results_fake_trivia_questions.json_alpaca-lora-4bit.txt shows the total score which is dataset len * 2 and also shows the number of correct answers as well as number of incorrect and unknown. latest code does.