martiansideofthemoon / style-transfer-paraphrase

Official code and data repository for our EMNLP 2020 long paper "Reformulating Unsupervised Style Transfer as Paraphrase Generation" (https://arxiv.org/abs/2010.05700).

Home Page:http://style.cs.umass.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPT2 paraphrase training

bmtm opened this issue · comments

commented

Just trying to train the paraphrase model and encountering some issues.

in requirements.txt:
pkg-resources==0.0.0 should be removed
torch==1.6.0+cu101 and torchvision==0.7.0+cu101 should probably reference non-cuda-specific versions

in run_finetune_paraphrase.sh
--do_eval should be --do_train probably?

when I run run_finetune_paraphrase.sh I get this error:
bash style_paraphrase/examples/run_finetune_paraphrase.sh 11/07/2020 13:32:45 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False 11/07/2020 13:33:04 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='datasets/paranmt_filtered', device=device(type='cuda', index=0), do_delete_old=False, do_eval=False, do_lower_case=False, do_train=True, eval_frequency_min=0, eval_patience=10, evaluate_during_training=True, evaluate_specific=None, extra_embedding_dim=768, fp16=False, fp16_opt_level='O1', global_dense_feature_list='none', gradient_accumulation_steps=2, job_id='0', learning_rate='5e-5', limit_examples=None, local_rank=0, logging_steps=20, max_grad_norm=1.0, max_steps=-1, model_name_or_path='gpt2-large', model_type='gpt2', n_gpu=1, no_cuda=False, num_train_epochs=3.0, optimizer='adam', output_dir='style_paraphrase/saved_models/test_paraphrase', overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=5, prefix_input_type='original', save_steps=500, save_total_limit=-1, seed=42, specific_style_train='-1', target_style_override='none', tokenizer_name='', warmup_steps=0, weight_decay=0.0) 11/07/2020 13:33:04 - INFO - style_dataset - {'keys': [{'key': 'sent1_tokens', 'position': 3, 'tokenize': True, 'metadata': False}, {'key': 'sent2_tokens', 'position': 4, 'tokenize': True, 'metadata': False}, {'key': 'f1_score', 'position': 5, 'tokenize': False, 'metadata': True}, {'key': 'kt_score', 'position': 6, 'tokenize': False, 'metadata': True}, {'key': 'ed_score', 'position': 7, 'tokenize': False, 'metadata': True}, {'key': 'langid', 'position': 8, 'tokenize': False, 'metadata': True}], 'max_total_length': 100, 'max_prefix_length': 50, 'max_suffix_length': 50, 'max_dense_length': 2, 'global_dense_length': 0} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 73062/73062 [00:12<00:00, 5756.52it/s] 11/07/2020 13:33:17 - INFO - style_dataset - Saving features into cached file datasets/paranmt_filtered/gpt2_cached_lm_train 11/07/2020 13:33:34 - INFO - style_dataset - Total truncated instances due to length limit = 213 / 73062 11/07/2020 13:33:34 - INFO - __main__ - ***** Running training ***** 11/07/2020 13:33:34 - INFO - __main__ - Num examples = 73062 11/07/2020 13:33:34 - INFO - __main__ - Num Epochs = 3 11/07/2020 13:33:34 - INFO - __main__ - Instantaneous batch size per GPU = 5 11/07/2020 13:33:34 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 10 11/07/2020 13:33:34 - INFO - __main__ - Gradient Accumulation steps = 2 11/07/2020 13:33:34 - INFO - __main__ - Total optimization steps = 21918 Epoch: 0%| | 0/3 [00:00<?, ?it/s/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [27,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertiont >= 0 && t < n_classesfailed. /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertiont >= 0 && t < n_classesfailed. Iteration: 0%| | 0/14613 [00:00<?, ?it/s] Epoch: 0%| | 0/3 [00:00<?, ?it/s] Traceback (most recent call last): File "style_paraphrase/run_lm_finetuning.py", line 507, in <module> main() File "style_paraphrase/run_lm_finetuning.py", line 424, in main global_step, tr_loss = train(args, gpt2_model, train_dataset, tokenizer) File "style_paraphrase/run_lm_finetuning.py", line 244, in train loss["lm"].backward() File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/tensor.py", line 233, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when callingcublasCreate(handle) terminate called after throwing an instance of 'std::runtime_error' what(): NCCL error in: /home/jack/pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:155, unhandled cuda error, NCCL version 2.8.0 ncclUnhandledCudaError: Call to CUDA function failed. Traceback (most recent call last): File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 303, in <module> main() File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 294, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/jack/miniconda3/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/test_paraphrase', '--model_type=gpt2', '--model_name_or_path=gpt2-large', '--data_dir=datasets/paranmt_filtered', '--do_train', '--save_steps', '500', '--logging_steps', '20', '--save_total_limit', '-1', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--job_id', '0', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1', '--optimizer', 'adam']' died with <Signals.SIGABRT: 6>.

and a similar error running eval with the pretrained checkpoint from paraphraser_gpt2_large:
`bash style_paraphrase/examples/run_evaluate_paraphrase.sh
11/07/2020 14:02:00 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: False
11/07/2020 14:02:19 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', data_dir='datasets/paranmt_filtered', device=device(type='cuda', index=0), do_delete_old=True, do_eval=True, do_lower_case=False, do_train=False, eval_frequency_min=0, eval_patience=10, evaluate_during_training=True, evaluate_specific=None, extra_embedding_dim=768, fp16=False, fp16_opt_level='O1', global_dense_feature_list='none', gradient_accumulation_steps=2, job_id='0', learning_rate='5e-5', limit_examples=1000, local_rank=0, logging_steps=1000, max_grad_norm=1.0, max_steps=-1, model_name_or_path='gpt2-large', model_type='gpt2', n_gpu=1, no_cuda=False, num_train_epochs=3.0, optimizer='adam', output_dir='style_paraphrase/saved_models/model_0', overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=5, prefix_input_type='original', save_steps=1000, save_total_limit=3, seed=42, specific_style_train='-1', target_style_override='none', tokenizer_name='', warmup_steps=0, weight_decay=0.0)
11/07/2020 14:02:19 - INFO - main - Evaluate the following checkpoints: ['style_paraphrase/saved_models/model_0/checkpoint-0']
Some weights of the model checkpoint at style_paraphrase/saved_models/model_0/checkpoint-0 were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.weight', 'transformer.extra_embedding_project.bias']

  • This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
  • This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    11/07/2020 14:02:37 - INFO - style_dataset - {'keys': [{'key': 'sent1_tokens', 'position': 3, 'tokenize': True, 'metadata': False}, {'key': 'sent2_tokens', 'position': 4, 'tokenize': True, 'metadata': False}, {'key': 'f1_score', 'position': 5, 'tokenize': False, 'metadata': True}, {'key': 'kt_score', 'position': 6, 'tokenize': False, 'metadata': True}, {'key': 'ed_score', 'position': 7, 'tokenize': False, 'metadata': True}, {'key': 'langid', 'position': 8, 'tokenize': False, 'metadata': True}], 'max_total_length': 100, 'max_prefix_length': 50, 'max_suffix_length': 50, 'max_dense_length': 2, 'global_dense_length': 0}
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1492/1492 [00:00<00:00, 3748.38it/s]
    11/07/2020 14:02:37 - INFO - style_dataset - Saving features into cached file datasets/paranmt_filtered/gpt2_cached_lm_dev
    11/07/2020 14:02:37 - INFO - data_utils - Limiting dataset to 1000 examples
    11/07/2020 14:02:37 - INFO - style_dataset - Total truncated instances due to length limit = 4 / 1000
    11/07/2020 14:02:37 - INFO - main - ***** Running evaluation checkpoint-0 *****
    11/07/2020 14:02:37 - INFO - main - Num examples = 1000
    11/07/2020 14:02:37 - INFO - main - Batch size = 4
    Evaluating: 0%| | 0/250 [00:00<?, ?it/s]/home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
    /home/jack/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
    Evaluating: 0%| | 0/250 [00:00<?, ?it/s]
    Traceback (most recent call last):
    File "style_paraphrase/run_lm_finetuning.py", line 507, in
    main()
    File "style_paraphrase/run_lm_finetuning.py", line 473, in main
    result = evaluate(args, gpt2_model, tokenizer, prefix=prefix)
    File "style_paraphrase/run_lm_finetuning.py", line 334, in evaluate
    curr_loss = gpt2_model.evaluate(batch)
    File "/media/jack/8ED6D4A5D6D48F39/vectorword/style-transfer-paraphrase/style_paraphrase/utils.py", line 133, in evaluate
    return lm_loss.mean().item()
    RuntimeError: CUDA error: device-side assert triggered
    terminate called after throwing an instance of 'std::runtime_error'
    what(): NCCL error in: /home/jack/pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:155, unhandled cuda error, NCCL version 2.8.0
    ncclUnhandledCudaError: Call to CUDA function failed.
    Traceback (most recent call last):
    File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/home/jack/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 303, in
    main()
    File "/home/jack/miniconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 294, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
    subprocess.CalledProcessError: Command '['/home/jack/miniconda3/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/model_0', '--model_type=gpt2', '--model_name_or_path=gpt2-large', '--data_dir=datasets/paranmt_filtered', '--do_eval', '--do_delete_old', '--save_steps', '1000', '--logging_steps', '1000', '--save_total_limit', '3', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--limit_examples', '1000', '--job_id', '0', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1']' died with <Signals.SIGABRT: 6>.`

googling the issue, it may have something to do with data labels?

commented

I did get paraphrase_many.py to work, although I think
paraphraser.modify_p(top_p=args.top_p) should be paraphraser.modify_p(top_p=args.top_p_value)

commented

one last thing - the top-p sampling configuration seems to be susceptible to repeating words, regardless of the top-p and temperature arguments.

an example that reliably triggers this: "From upper trunk brachial plexus, through posterior triangle, across top of scapula and through scapular notch, down posterior aspect scapula and across scapular spine to supraspinatus, infraspinatus"

Hey, thanks for pointing this out! I've fixed some of the issues in 80b6978, checking the CUDA issues now. They are likely due to the -1 in the label string and some changes in the HuggingFace library. Will get back by the end of the day.

Fixed in 9bb93fa. It was indeed becase of huggingface/transformer changes. They dropped the ignore_index=-1 from their GPT2 modeling code, the Pytorch default is -100.