Not getting any corrections on the custom dataset.

Question

Not getting any corrections on the custom dataset.

alan-ai-learner opened this issue 3 years ago · comments

Getting this problem when im trying bert pretrained model on some custom dataset. Similar to this issue, #131 .
Any help would be great?
I tried this sentence,

I walk to the store and I bought milk.

thanks

Alex Skurzhanskyi · Answer 1 · Wed Dec 01 2021 21:54:40 GMT+0800 (China Standard Time)

Could you specify any information on how you ran the script?
Why do you think that the model must propose edits for this particular sentence?

Alankar Shukla · Answer 2 · Thu Dec 02 2021 13:56:50 GMT+0800 (China Standard Time)

Could you specify any information on how you ran the script? Why do you think that the model must propose edits for this particular sentence?

I followed these following steps given in the issue #36 :

cd gector
#create conda env
conda create -n gector python=3.7
conda activate gector
pip install torch===1.3.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
# get model
wget https://grammarly-nlp-data-public.s3.amazonaws.com/gector/bert_0_gector.th
# get eval file and inflate
wget https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness_v2.1.bea19.tar.gz
tar -xzvf wi+locness_v2.1.bea19.tar.gz
# run inference
python predict.py --model_path ./bert_0_gector.th --vocab_path ./data/output_vocabulary/ --input_file wi+locness/test/ABCN.test.bea19.orig  --output_file foo --transformer_model bert --special_tokens_fix 0

and it ran succesfully on this "ABCN.test.bea19.orig " file, and as a result i got the 2300 corrections out of 4700 sentences, they are fine. After that i made a new test.orig file an wrote some incorrect sentences to be precise 2 sentences, and tried to ran it and the code it run successfully. But both the sentences were grammatically incorrect but the result i got are same as the sentences passed in test.orig file.

The two sentences i passed were grammatically incorrect(i checked on grammarly free grammar checker) so i was hoping that model we correct them.

Alex Skurzhanskyi · Answer 3 · Thu Dec 02 2021 19:48:40 GMT+0800 (China Standard Time)

The script parameters look good to me.
Unfortunately, considering the nature of deep learning models, we cannot guarantee that it will correct all the errors.
In Grammarly, there are many different models, and this particular model isn't among them. So you cannot expect that the GECToR model will fix errors in the same way as Grammarly does.

Alankar Shukla · Answer 4 · Thu Dec 02 2021 20:02:06 GMT+0800 (China Standard Time)

The script parameters look good to me. Unfortunately, considering the nature of deep learning models, we cannot guarantee that it will correct all the errors. In Grammarly, there are many different models, and this particular model isn't among them. So you cannot expect that the GECToR model will fix errors in the same way as Grammarly does.

thanks for your reply, can you suggest few approaches that can be followed to enhance the performance of the model,

Alex Skurzhanskyi · Answer 5 · Thu Dec 02 2021 22:03:25 GMT+0800 (China Standard Time)

I believe a good way to improve model performance would be fine-tuning it on in-domain datasets.
GECToR was trained on rather academic data where essays of English learners were corrected by tutors.

Alankar Shukla · Answer 6 · Thu Dec 02 2021 23:31:35 GMT+0800 (China Standard Time)

thanks will try that!