Wrong tensor shape during pretrain

Question

Wrong tensor shape during pretrain

AlexPak opened this issue 4 years ago · comments

[INFO] 2020-05-04 11:56:22 > Run name : BERT-BERT-{phase}-layers_count={layers_count}-hidden_size={hidden_size}-heads_count={heads_count}-{timestamp}-layers_count=1-hidden_size=128-heads_count=2-2020_05_04_11_56_22
[INFO] 2020-05-04 11:56:22 > {'config_path': None, 'data_dir': None, 'train_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/train.txt', 'val_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/val.txt', 'dictionary_path': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/dict.txt', 'checkpoint_dir': '/home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/', 'log_output': None, 'dataset_limit': None, 'epochs': 100, 'batch_size': 16, 'print_every': 1, 'save_every': 10, 'vocabulary_size': 60000, 'max_len': 512, 'lr': 0.001, 'clip_grads': False, 'layers_count': 1, 'hidden_size': 128, 'heads_count': 2, 'd_ff': 128, 'dropout_prob': 0.1, 'device': 'cuda:0', 'function': <function pretrain at 0x7f942c367b70>}
[INFO] 2020-05-04 11:56:22 > Constructing dictionaries...
[INFO] 2020-05-04 11:56:23 > dictionary vocabulary : 60000 tokens
[INFO] 2020-05-04 11:56:23 > Loading datasets...
1374it [00:11, 115.92it/s]
344it [00:05, 68.72it/s]
[INFO] 2020-05-04 11:56:40 > Train dataset size : 1828898
[INFO] 2020-05-04 11:56:40 > Building model...
[INFO] 2020-05-04 11:56:40 > BERT(
(encoder): TransformerEncoder(
(encoder_layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attention_layer): Sublayer(
(sublayer): MultiHeadAttention(
(query_projection): Linear(in_features=128, out_features=128, bias=True)
(key_projection): Linear(in_features=128, out_features=128, bias=True)
(value_projection): Linear(in_features=128, out_features=128, bias=True)
(final_projection): Linear(in_features=128, out_features=128, bias=True)
(dropout): Dropout(p=0.1)
(softmax): Softmax()
)
(layer_normalization): LayerNormalization()
)
(pointwise_feedforward_layer): Sublayer(
(sublayer): PointwiseFeedForwardNetwork(
(feed_forward): Sequential(
(0): Linear(in_features=128, out_features=128, bias=True)
(1): Dropout(p=0.1)
(2): GELU()
(3): Linear(in_features=128, out_features=128, bias=True)
(4): Dropout(p=0.1)
)
)
(layer_normalization): LayerNormalization()
)
(dropout): Dropout(p=0.1)
)
)
)
(token_embedding): Embedding(60000, 128)
(positional_embedding): PositionalEmbedding(
(positional_embedding): Embedding(512, 128)
)
(segment_embedding): SegmentEmbedding(
(segment_embedding): Embedding(2, 128)
)
(token_prediction_layer): Linear(in_features=128, out_features=60000, bias=True)
(classification_layer): Linear(in_features=128, out_features=2, bias=True)
)
[INFO] 2020-05-04 11:56:40 > 15585634 parameters
[INFO] 2020-05-04 11:56:40 > Start training...
0%| | 0/114307 [00:00<?, ?it/s]
0%| | 0/52472 [00:00<?, ?it/s]
[INFO] 2020-05-04 11:56:47 > Epoch: 0 Progress: 0.0% Elapsed: 0:00:03 Examples/second: 5e+05 Train Loss: inf Val Loss: inf Train Metrics: [inf] Val Metrics: [inf] Learning rate: 1.768e-07
[INFO] 2020-05-04 11:56:48 > Saved model to /home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/epoch=000-val_loss=inf-val_metrics=inf.pth
[INFO] 2020-05-04 11:56:48 > Current best model is /home/ubuntu/ALEX/BERT-pytorch/data/rusbiomed/checkpoints/epoch=000-val_loss=inf-val_metrics=inf.pth
5%|███▊ | 5364/114307 [02:28<52:08, 34.82it/s]Traceback (most recent call last):
File "main.py", line 34, in
main()
File "main.py", line 30, in main
args.function(**config, config=config)
File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/train.py", line 104, in pretrain
trainer.run(epochs=epochs)
File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/trainer.py", line 98, in run
train_epoch_loss, train_epoch_metrics = self.run_epoch(self.train_dataloader, mode='train')
File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/trainer.py", line 64, in run_epoch
predictions, batch_losses = self.loss_model(inputs, targets)
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/loss_models.py", line 17, in forward
outputs = self.model(inputs)
File "/home/ubuntu/.virtualenvs/ml/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/ALEX/BERT-pytorch/bert/train/model/bert.py", line 64, in forward
embedded_sources = token_embedded + positional_embedded + segment_embedded
RuntimeError: The size of tensor a (515) must match the size of tensor b (512) at non-singleton dimension 1