grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

High validation accuracy in metrics

LeonhardEulerr opened this issue · comments

Hi guys,

We are using gector for Swedish grammar correction where we use swedish bert model from hugging face.
You did very good work with your paper and code written. It is working pretty good, but there is at least one thing that I dont understand when printing metrics while training.

Namely, validation accuracy is getting very high (around 98%) quickly, which to me implies that it should have kind of similar acc on test set, shouldnt it? So how is acc calculated? Is it based on what transformations are done and compared to preprocessed ground truth? If it is done that way it could potentially explain higher accuracy in validation than in test where we basically check if entire sentence is correct but I still doubt that it would produce such high validation accuracy while training anyways.

Could you please explain how such high validation accuracy is obtained while training?

Hi @LeonhardEulerr
It's nice to hear that you like our GitHub repository.
The accuracy is so high because, usually, the vast majority of predictions are $KEEP tags, which means you shouldn't change the corresponding token. $KEEP tags usually take > 95% of data. That's why a simple accuracy score is so high so quickly.

Thank you @skurzhanskyi for a quick response.

That was exactly what I was suspecting. Thanks.