GPT2 paraphrase training

Question

GPT2 paraphrase training

bmtm opened this issue 4 years ago · comments

Just trying to train the paraphrase model and encountering some issues.

in requirements.txt:
pkg-resources==0.0.0 should be removed
torch==1.6.0+cu101 and torchvision==0.7.0+cu101 should probably reference non-cuda-specific versions

in run_finetune_paraphrase.sh
--do_eval should be --do_train probably?

when I run run_finetune_paraphrase.sh I get this error:
bash style_paraphrase/examples/run_finetune_paraphrase.sh 11/07/2020 13:32:45 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False 11/07/2020 13:33:04 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='datasets/paranmt_filtered', device=device(type='cuda', index=0), do_delete_old=False, do_eval=False, do_lower_case=False, do_train=True, eval_frequency_min=0, eval_patience=10, evaluate_during_training=True, evaluate_specific=None, extra_embedding_dim=768, fp16=False, fp16_opt_level='O1', global_dense_feature_list='none', gradient_accumulation_steps=2, job_id='0', learning_rate='5e-5', limit_examples=None, local_rank=0, logging_steps=20, max_grad_norm=1.0, max_steps=-1, model_name_or_path='gpt2-large', model_type='gpt2', n_gpu=1, no_cuda=False, num_train_epochs=3.0, optimizer='adam', output_dir='style_paraphrase/saved_models/test_paraphrase', overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=5, prefix_input_type='original', save_steps=500, save_total_limit=-1, seed=42, specific_style_train='-1', target_style_override='none', tokenizer_name='', warmup_steps=0, weight_decay=0.0) 11/07/2020 13:33:04 - INFO - style_dataset - {'keys': [{'key': 'sent1_tokens', 'position': 3, 'tokenize': True, 'metadata': False}, {'key': 'sent2_tokens', 'position': 4, 'tokenize': True, 'metadata': False}, {'key': 'f1_score', 'position': 5, 'tokenize': False, 'metadata': True}, {'key': 'kt_score', 'position': 6, 'tokenize': False, 'metadata': True}, {'key': 'ed_score', 'position': 7, 'tokenize': False, 'metadata': True}, {'key': 'langid', 'position': 8, 'tokenize': False, 'metadata': True}], 'max_total_length': 100, 'max_prefix_length': 50, 'max_suffix_length': 50, 'max_dense_length': 2, 'global_dense_length': 0} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 73062/73062 [00:12<00:00, 5756.52it/s] 11/07/2020 13:33:17 - INFO - style_dataset - Saving features into cached file datasets/paranmt_filtered/gpt2_cached_lm_train 11/07/2020 13:33:34 - INFO - style_dataset - Total truncated instances due to length limit = 213 / 73062 11/07/2020 13:33:34 - INFO - __main__ - ***** Running training ***** 11/07/2020 13:33:34 - INFO - __main__ - Num examples = 73062 11/07/2020 13:33:34 - INFO - __main__ - Num Epochs = 3 11/07/2020 13:33:34 - INFO - __main__ - Instantaneous batch size per GPU = 5 11/07/2020 13:33:34 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 10 11/07/2020 13:33:34 - INFO - __main__ - Gradient Accumulation steps = 2 11/07/2020 13:33:34 - INFO - __main__ - Total optimization steps = 21918 Epoch: 0%| | 0/3 [00:00<?, ?it/s/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [27,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertiont >= 0 && t < n_classesfailed. Iteration: 0%| | 0/14613 [00:00<?, ?it/s] Epoch: 0%| | 0/3 [00:00<?, ?it/s] Traceback (most recent call last): File "style_paraphrase/run_lm_finetuning.py", line 507, in <module> main() File "style_paraphrase/run_lm_finetuning.py", line 424, in main global_step, tr_loss = train(args, gpt2_model, train_dataset, tokenizer) File "style_paraphrase/run_lm_finetuning.py", line 244, in train loss["lm"].backward() File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/tensor.py", line 233, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when callingcublasCreate(handle) terminate called after throwing an instance of 'std::runtime_error' what(): NCCL error in: /home/jack/pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:155, unhandled cuda error, NCCL version 2.8.0 ncclUnhandledCudaError: Call to CUDA function failed. Traceback (most recent call last): File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 303, in <module> main() File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 294, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/jack/miniconda3/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/test_paraphrase', '--model_type=gpt2', '--model_name_or_path=gpt2-large', '--data_dir=datasets/paranmt_filtered', '--do_train', '--save_steps', '500', '--logging_steps', '20', '--save_total_limit', '-1', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--job_id', '0', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1', '--optimizer', 'adam']' died with <Signals.SIGABRT: 6>.

and a similar error running eval with the pretrained checkpoint from paraphraser_gpt2_large:
`bash style_paraphrase/examples/run_evaluate_paraphrase.sh
11/07/2020 14:02:00 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False
11/07/2020 14:02:19 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='datasets/paranmt_filtered', device=device(type='cuda', index=0), do_delete_old=True, do_eval=True, do_lower_case=False, do_train=False, eval_frequency_min=0, eval_patience=10, evaluate_during_training=True, evaluate_specific=None, extra_embedding_dim=768, fp16=False, fp16_opt_level='O1', global_dense_feature_list='none', gradient_accumulation_steps=2, job_id='0', learning_rate='5e-5', limit_examples=1000, local_rank=0, logging_steps=1000, max_grad_norm=1.0, max_steps=-1, model_name_or_path='gpt2-large', model_type='gpt2', n_gpu=1, no_cuda=False, num_train_epochs=3.0, optimizer='adam', output_dir='style_paraphrase/saved_models/model_0', overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=5, prefix_input_type='original', save_steps=1000, save_total_limit=3, seed=42, specific_style_train='-1', target_style_override='none', tokenizer_name='', warmup_steps=0, weight_decay=0.0)
11/07/2020 14:02:19 - INFO - main - Evaluate the following checkpoints: ['style_paraphrase/saved_models/model_0/checkpoint-0']
Some weights of the model checkpoint at style_paraphrase/saved_models/model_0/checkpoint-0 were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.weight', 'transformer.extra_embedding_project.bias']

This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
11/07/2020 14:02:37 - INFO - style_dataset - {'keys': [{'key': 'sent1_tokens', 'position': 3, 'tokenize': True, 'metadata': False}, {'key': 'sent2_tokens', 'position': 4, 'tokenize': True, 'metadata': False}, {'key': 'f1_score', 'position': 5, 'tokenize': False, 'metadata': True}, {'key': 'kt_score', 'position': 6, 'tokenize': False, 'metadata': True}, {'key': 'ed_score', 'position': 7, 'tokenize': False, 'metadata': True}, {'key': 'langid', 'position': 8, 'tokenize': False, 'metadata': True}], 'max_total_length': 100, 'max_prefix_length': 50, 'max_suffix_length': 50, 'max_dense_length': 2, 'global_dense_length': 0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1492/1492 [00:00<00:00, 3748.38it/s]
11/07/2020 14:02:37 - INFO - style_dataset - Saving features into cached file datasets/paranmt_filtered/gpt2_cached_lm_dev
11/07/2020 14:02:37 - INFO - data_utils - Limiting dataset to 1000 examples
11/07/2020 14:02:37 - INFO - style_dataset - Total truncated instances due to length limit = 4 / 1000
11/07/2020 14:02:37 - INFO - main - ***** Running evaluation checkpoint-0 *****
11/07/2020 14:02:37 - INFO - main - Num examples = 1000
11/07/2020 14:02:37 - INFO - main - Batch size = 4
Evaluating: 0%| | 0/250 [00:00<?, ?it/s]/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
Evaluating: 0%| | 0/250 [00:00<?, ?it/s]
Traceback (most recent call last):
File "style_paraphrase/run_lm_finetuning.py", line 507, in
main()
File "style_paraphrase/run_lm_finetuning.py", line 473, in main
result = evaluate(args, gpt2_model, tokenizer, prefix=prefix)
File "style_paraphrase/run_lm_finetuning.py", line 334, in evaluate
curr_loss = gpt2_model.evaluate(batch)
File "/media/jack/8ED6D4A5D6D48F39/vectorword/style-transfer-paraphrase/style_paraphrase/utils.py", line 133, in evaluate
return lm_loss.mean().item()
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /home/jack/pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:155, unhandled cuda error, NCCL version 2.8.0
ncclUnhandledCudaError: Call to CUDA function failed.
Traceback (most recent call last):
File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 303, in
main()
File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 294, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/jack/miniconda3/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/model_0', '--model_type=gpt2', '--model_name_or_path=gpt2-large', '--data_dir=datasets/paranmt_filtered', '--do_eval', '--do_delete_old', '--save_steps', '1000', '--logging_steps', '1000', '--save_total_limit', '3', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--limit_examples', '1000', '--job_id', '0', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1']' died with <Signals.SIGABRT: 6>.`

googling the issue, it may have something to do with data labels?

bmtm · Answer 1 · Sun Nov 08 2020 16:34:22 GMT+0800 (China Standard Time)

I did get paraphrase_many.py to work, although I think
paraphraser.modify_p(top_p=args.top_p) should be paraphraser.modify_p(top_p=args.top_p_value)

bmtm · Answer 2 · Sun Nov 08 2020 18:13:22 GMT+0800 (China Standard Time)

one last thing - the top-p sampling configuration seems to be susceptible to repeating words, regardless of the top-p and temperature arguments.

an example that reliably triggers this: "From upper trunk brachial plexus, through posterior triangle, across top of scapula and through scapular notch, down posterior aspect scapula and across scapular spine to supraspinatus, infraspinatus"

Kalpesh Krishna · Answer 3 · Mon Nov 09 2020 22:51:32 GMT+0800 (China Standard Time)

Hey, thanks for pointing this out! I've fixed some of the issues in 80b6978, checking the CUDA issues now. They are likely due to the -1 in the label string and some changes in the HuggingFace library. Will get back by the end of the day.

Kalpesh Krishna · Answer 4 · Tue Nov 10 2020 03:34:51 GMT+0800 (China Standard Time)

Fixed in 9bb93fa. It was indeed becase of huggingface/transformer changes. They dropped the ignore_index=-1 from their GPT2 modeling code, the Pytorch default is -100.