CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Validation Accuracy Aggregation

Adamits opened this issue · comments

commented

Currently, our validation_step method on the BaseEncoderDecoder computes a per batch accuracy and aggregates them at the end of each epoch. Because of this, we get a macro average accuracy that will depend on the batch size.

I noticed something must be strange when using evaluation sets of size 1000 and getting validation accuracies to many decimal places (like 0.9247395...). I think we probably want to accumulate raw counts of correct/incorrect dev samples per batch, and then aggregate those into an accuracy at the end of each epoch.

The impact of this should be small, but still, I believe we are getting slightly incorrect accuracies according to the expected micro accuracy.

Yes, I agree we want micro-accuracy not macro even though the only way I think they can differ is w.r.t. a partial batch.

commented

Oh yeah good point. I guess we could also get some loss of precision too. Anyway, micro certainly seems preferable.

Closed in #120.