grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predict @@UNKNOWN@@ during prediction

tuzeao opened this issue · comments

Hi.
Recently I have been working out with this fantastic model. I generated my data and trained it and made predictions. Everything seems work great.
Finally when I tried check the output, this strange thing happened: In my predictions many edit operations of char is predicted as @@unknown@@, like this:
image

I dont think something wrong with my training process. I generate source and target sentence, split them to two files, use bert tokenizer to tokenize them, then use preprocess to make them to correct format for train.py.
Though I have only 4 types of edit operations due to apply this model in Chinese, But that's OK for my application scene.

Any ideas on how this would happen? I have checked all the issues and seems like no one has the same situation.
Stucked here like two days so I will so thankful if someone gives some advice.

Ok I figured it out. the gap between training data and labels.txt
if you add your personilized tranforms while forget adding them to the labels.txt, it happens.