High validation accuracy in metrics
LeonhardEulerr opened this issue · comments
Hi guys,
We are using gector for Swedish grammar correction where we use swedish bert model from hugging face.
You did very good work with your paper and code written. It is working pretty good, but there is at least one thing that I dont understand when printing metrics while training.
Namely, validation accuracy is getting very high (around 98%) quickly, which to me implies that it should have kind of similar acc on test set, shouldnt it? So how is acc calculated? Is it based on what transformations are done and compared to preprocessed ground truth? If it is done that way it could potentially explain higher accuracy in validation than in test where we basically check if entire sentence is correct but I still doubt that it would produce such high validation accuracy while training anyways.
Could you please explain how such high validation accuracy is obtained while training?
Hi @LeonhardEulerr
It's nice to hear that you like our GitHub repository.
The accuracy is so high because, usually, the vast majority of predictions are $KEEP
tags, which means you shouldn't change the corresponding token. $KEEP
tags usually take > 95% of data. That's why a simple accuracy score is so high so quickly.
Thank you @skurzhanskyi for a quick response.
That was exactly what I was suspecting. Thanks.