Problem with building probabilistic dictionaries
Syrkovski opened this issue · comments
Hello, I tried to build probabilistic dictionaries (I need it for training Becleaner model), but as a result I get something like:
afterwards NULL 0.0000124
pension NULL 0.0000372
truss NULL 0.0000124
birthday NULL 0.0000744
commemorate NULL 0.0000248
Entire second column is "NULL"
The command I used is:
mosesdecoder/scripts/training/train-model.perl --alignment grow-diag-final-and --root-dir bicleaner_inf/ --corpus bicleaner_inf/corpus.clean --e en --f zh --mgiza -mgiza-cpus 8 --parallel --first-step 1 --last-step 4 --external-bin-dir mgiza/mgizapp/bin/
It looks like major error occurs in mgiza:
Merging A3.final.part* tables
Executing: enchmodels/mgiza/mgizapp/bin/merge_alignment.py enchmodels/bicleaner_inf/giza.zh-en/zh-en.A3.final.part*> enchmodels/bicleaner_inf/giza.zh-en/zh-en.A3.final
Traceback (most recent call last):
File "enchmodels/mgiza/mgizapp/bin/merge_alignment.py", line 32, in
st1 = files[i].readline();
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 84: ordinal not in range(128)
Exit code: 1
And after it gives the whole chunk of errors like:
Use of uninitialized value $a in scalar chomp at enchmodels/mosesdecoder/scripts/training/LexicalTranslationModel.pm line 105
Use of uninitialized value in substitution (s///) at enchmodels/mosesdecoder/scripts/training/LexicalTranslationModel.pm line 40.
Solved this problem
Solved this problem
Seems like the best way is to recompile MGIZA
I used the instructions here:
https://hovinh.github.io/blog/2016-04-29-install-mgiza-ubuntu/