Validation Accuracy Aggregation

Question

Validation Accuracy Aggregation

Adamits opened this issue a year ago · comments

Currently, our validation_step method on the BaseEncoderDecoder computes a per batch accuracy and aggregates them at the end of each epoch. Because of this, we get a macro average accuracy that will depend on the batch size.

I noticed something must be strange when using evaluation sets of size 1000 and getting validation accuracies to many decimal places (like 0.9247395...). I think we probably want to accumulate raw counts of correct/incorrect dev samples per batch, and then aggregate those into an accuracy at the end of each epoch.

The impact of this should be small, but still, I believe we are getting slightly incorrect accuracies according to the expected micro accuracy.

Kyle Gorman · Answer 1 · Sat Jul 22 2023 22:00:26 GMT+0800 (China Standard Time)

Yes, I agree we want micro-accuracy not macro even though the only way I think they can differ is w.r.t. a partial batch.

Adam · Answer 2 · Sun Jul 23 2023 01:35:45 GMT+0800 (China Standard Time)

Oh yeah good point. I guess we could also get some loss of precision too. Anyway, micro certainly seems preferable.

Kyle Gorman · Answer 3 · Wed Jul 26 2023 02:14:13 GMT+0800 (China Standard Time)

Closed in #120.