Pretraining Bart on Single corpus
1029694141 opened this issue · comments
Hi,
First thanks for the work on this repo !
I‘m continues pretraining BART on myself English corpus“train_fineshed.txt”,but the python arguments seems didn‘t work
:“file not found error: ***/train_fineshed.txt.01”
my python command as follow:
python pretrain_nmt.py -n 1 -nr 0 -g 2 --pretrained_model facebook/bart-base --use_official_pretrained --tokenizer_name_or_path facebook/bart-base --is_summarization --warmup_steps 500 --save_intermediate_checkpoints --mono_src /home/WwhStuGrp/yyfwwhstu16/yanmtt/dataset/pubmed/pubmed-dataset/train_fineshed.txt --monolingual_domains 1 --train_domains 1
Can u point out my mistake about ur toolkit?
Thank you for your kind help!
Thanks for using this toolkit.
You are missing the --shard_files argument since you are running the script for the first time.
it‘s useful,thanks!!!