Predict @@UNKNOWN@@ during prediction

Question

Predict @@UNKNOWN@@ during prediction

tuzeao opened this issue 2 years ago · comments

Hi.
Recently I have been working out with this fantastic model. I generated my data and trained it and made predictions. Everything seems work great.
Finally when I tried check the output, this strange thing happened: In my predictions many edit operations of char is predicted as @@unknown@@, like this:

I dont think something wrong with my training process. I generate source and target sentence, split them to two files, use bert tokenizer to tokenize them, then use preprocess to make them to correct format for train.py.
Though I have only 4 types of edit operations due to apply this model in Chinese, But that's OK for my application scene.

Any ideas on how this would happen? I have checked all the issues and seems like no one has the same situation.
Stucked here like two days so I will so thankful if someone gives some advice.

Alex Lin · Answer 1 · Fri Jun 24 2022 11:13:11 GMT+0800 (China Standard Time)

Ok I figured it out. the gap between training data and labels.txt
if you add your personilized tranforms while forget adding them to the labels.txt, it happens.