prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pretraining Bart on Single corpus

1029694141 opened this issue · comments

commented

Hi,

First thanks for the work on this repo !

I‘m continues pretraining BART on myself English corpus“train_fineshed.txt”,but the python arguments seems didn‘t work
:“file not found error: ***/train_fineshed.txt.01”

my python command as follow:

python pretrain_nmt.py -n 1 -nr 0 -g 2 --pretrained_model facebook/bart-base --use_official_pretrained --tokenizer_name_or_path facebook/bart-base --is_summarization --warmup_steps 500 --save_intermediate_checkpoints --mono_src /home/WwhStuGrp/yyfwwhstu16/yanmtt/dataset/pubmed/pubmed-dataset/train_fineshed.txt --monolingual_domains 1 --train_domains 1

Can u point out my mistake about ur toolkit?

Thank you for your kind help!

Thanks for using this toolkit.

You are missing the --shard_files argument since you are running the script for the first time.

commented

it‘s useful,thanks!!!