Pretraining error

Question

Pretraining error

liu3zhenlab opened this issue a year ago · comments

Thanks for developing these valuable scripts. We ran run_pretrain.py and encountered the following issue. We appreciate your guidance for the troubleshooting. Thanks.

07/10/2023 00:11:34 - INFO - main - Training new model from scratch
07/10/2023 00:11:36 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-06, beta1=0.9, beta2=0.98, block_size=512, cache_dir=None, config_name='../src/transfo
rmers/dnabert-config/bert-config-6/config.json', device=device(type='cpu'), do_eval=True, do_train=True, eval_all_checkpoints=False, eval_data_file='../data/3k_6mer/1st_finished_asm_3k_
6mer.all', evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=25, learning_rate=0.0004, line_by_line=True, local_rank=-1, logging_steps=500, max
grad_norm=1.0, max_steps=200000, mlm=True, mlm_probability=0.025, model_name_or_path=None, model_type='dna', n_gpu=0, n_process=16, no_cuda=False, num_train_epochs=1.0, output_dir='k6'
, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=8, save_steps=500, save_total_limit=20, seed=42, server_ip='', server_port='', sh
ould_continue=False, tokenizer_name='dna6', train_data_file='../data/3k_6mer/1st_finished_asm_3k_6mer.all', warmup_steps=10000, weight_decay=0.01)
07/10/2023 00:11:36 - INFO - main - Creating features from dataset file at ../data/3k_6mer/1st_finished_asm_3k_6mer.all
07/10/2023 00:14:39 - INFO - main - Saving features into cached file ../data/3k_6mer/dna_cached_lm_512_1st_finished_asm_3k_6mer.all
07/10/2023 00:14:47 - INFO - main - ***** Running training *****
07/10/2023 00:14:47 - INFO - main - Num examples = 357566
07/10/2023 00:14:47 - INFO - main - Num Epochs = 112
07/10/2023 00:14:47 - INFO - main - Instantaneous batch size per GPU = 8
07/10/2023 00:14:47 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 200
07/10/2023 00:14:47 - INFO - main - Gradient Accumulation steps = 25
07/10/2023 00:14:47 - INFO - main - Total optimization steps = 200000
Iteration: 0%| | 0/44696 [00:00<?, ?it/s]
Epoch: 0%| | 0/112 [00:00<?, ?it/s]?it/s]
Traceback (most recent call last):
File "../scripts/run_pretrain.py", line 890, in
main()
File "../scripts/run_pretrain.py", line 840, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "../scripts/run_pretrain.py", line 426, in train
inputs, labels = mask_tokens(batch, tokenizer, args) if args.mlm else (batch, batch)
File "../scripts/run_pretrain.py", line 272, in mask_tokens
probability_matrix.masked_fill(torch.tensor(special_tokens_mask, dtype=torch.bool), value=0.0)
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask'