Failure to load the model after training the paraphraser in `style_paraphrase/examples/run_finetune_paraphrase.sh`
guanqun-yang opened this issue · comments
Hi,
I configured the environment exactly as instructed in the repository and downloaded the data (the folder structure of the datasets
directory shown below). When I ran the style_paraphrase/examples/run_finetune_paraphrase.py
, it looks normal all the way after the model has been trained. However, during model loading, there seem to be some errors.
datasets/
├── paranmt_filtered
├── shakespeare
│ └── raw
└── shakespeare-bin
├── input0
└── label
Iteration: 100%|██████████████████████████████████████████████████████████████████████████████████| 2284/2284 [27:36<00:00, 1.38it/s]
Epoch: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1/1 [27:36<00:00, 1656.50s/it]
08/18/2021 15:41:32 - INFO - __main__ - global_step = 1142, average loss = 2.494835122036642
08/18/2021 15:41:34 - INFO - __main__ - Saving model checkpoint to style_paraphrase/saved_models/test_paraphrase/checkpoint-1142
Traceback (most recent call last):
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/transformers/configuration_utils.py", line 363, in get_config_dict
resolved_config_file = cached_path(
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/transformers/file_utils.py", line 957, in cached_path
raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "style_paraphrase/run_lm_finetuning.py", line 505, in <module>
main()
File "style_paraphrase/run_lm_finetuning.py", line 432, in main
gpt2_model, tokenizer = init_gpt2_model(checkpoint_dir=args.output_dir,
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style_paraphrase/utils.py", line 51, in init_gpt2_model
model = model_class.from_pretrained(checkpoint_dir)
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 867, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/transformers/configuration_utils.py", line 329, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/transformers/configuration_utils.py", line 382, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'style_paraphrase/saved_models/test_paraphrase'. Make sure that:
- 'style_paraphrase/saved_models/test_paraphrase' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'style_paraphrase/saved_models/test_paraphrase' is the correct path to a directory containing a config.json file
Traceback (most recent call last):
File "/data/installation/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/installation/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/data/gyang16/HateSpeech/style-transfer-paraphrase/style-venv/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/test_paraphrase', '--model_type=gpt2', '--model_name_or_path=gpt2-medium', '--data_dir=datasets/paranmt_filtered', '--do_train', '--save_steps', '500', '--logging_steps', '20', '--save_total_limit', '-1', '--evaluate_during_training', '--num_train_epochs', '1', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '32', '--job_id', 'paraphraser_test', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1', '--optimizer', 'adam']' returned non-zero exit status 1.
This issue is related to #6, see the solution here: #6 (comment)