LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers

Home Page:https://arxiv.org/abs/2004.08249

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

argdict

riosempre opened this issue · comments

I get an error while pre-processing the data by command wmt14en-de.sh. Upon investigation, the error is caused by the argument srcdict not being written in the parameters of the python command.

Traceback (most recent call last):
File "preprocess.py", line 359, in
cli_main()
File "preprocess.py", line 355, in cli_main
main(args)
File "preprocess.py", line 64, in main
raise FileExistsError(dict_path(args.source_lang))
FileExistsError: ../data-bin/wmt14_en_de_joined_dict/dict.en.txt

if not args.srcdict and os.path.exists(dict_path(args.source_lang)):
    raise FileExistsError(dict_path(args.source_lang))_

Where is srcdict? Is it necessary? Should I change the code for it to work?

EDIT: I changed the code from:

python preprocess.py --source-lang en --target-lang de
--trainpref $prep/train --validpref $prep/valid --testpref $prep/test
--destdir ../data-bin/wmt14_en_de_joined_dict
--joined-dictionary

to

fairseq-preprocess --source-lang en --target-lang de
--trainpref $prep/train --validpref $prep/valid --testpref $prep/test
--destdir ../data-bin/wmt14_en_de_joined_dict
--srcdict ../data-bin/wmt14_en_de_joined_dict/dict.en.txt --tgtdict ../data-bin/wmt14_en_de_joined_dict/dict.de.txt

was that right?

thanks for reaching out, it seems that the error is file ../data-bin/wmt14_en_de_joined_dict/dict.en.txt already exists.

You can fix this by: 1) delete the folder ../data-bin/wmt14_en_de_joined_dict, and 2) ensure the folder ../data-bin exists.

BTW, the preprocess bash file is designed to run at the pre-process folder.