OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found

Question

OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found

ioannist opened this issue 4 years ago · comments

I tried training the paraphraser with gpt2 (small) as the large model would not fit my 1080 Ti. Everything went alright until the last iteration, where I got the error below. The final checkpoint seems to have been saved successfully. However, python tries to read from

file style_paraphrase/saved_models/test_paraphrase/config.json

which was not created and does not exist. All config.json files are inside their respective checkpoint folders.

12/03/2020 18:22:39 - INFO - __main__ -    global_step = 21918, average loss = 1.8063476276852939
12/03/2020 18:22:40 - INFO - __main__ -   Saving model checkpoint to style_paraphrase/saved_models/test_paraphrase/checkpoint-21918
Traceback (most recent call last):
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 369, in get_config_dict
    local_files_only=local_files_only,
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/file_utils.py", line 957, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "style_paraphrase/run_lm_finetuning.py", line 507, in <module>
    main()
  File "style_paraphrase/run_lm_finetuning.py", line 437, in main
    tokenizer_class=tokenizer_class)
  File "/home/ioannis/Desktop/style-transfer-paraphrase/style_paraphrase/utils.py", line 51, in init_gpt2_model
    model = model_class.from_pretrained(checkpoint_dir)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/modeling_utils.py", line 876, in from_pretrained
    **kwargs,
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 329, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 382, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'style_paraphrase/saved_models/test_paraphrase'. Make sure that:

- 'style_paraphrase/saved_models/test_paraphrase' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'style_paraphrase/saved_models/test_paraphrase' is the correct path to a directory containing a config.json file


Traceback (most recent call last):
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ioannis/anaconda3/envs/style-transfer-paraphrase/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/test_paraphrase', '--model_type=gpt2', '--model_name_or_path=gpt2', '--data_dir=datasets/paranmt_filtered', '--do_train', '--save_steps', '500', '--logging_steps', '20', '--save_total_limit', '-1', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--job_id', 'paraphraser_test', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1', '--optimizer', 'adam']' returned non-zero exit status 1.

Ioannis Tsiokos · Answer 1 · Fri Dec 04 2020 01:04:06 GMT+0800 (China Standard Time)

Same problem with run_finetune_shakespeare_0.sh, btw, when training with gpt2 (small)

Kalpesh Krishna · Answer 2 · Fri Dec 04 2020 22:06:03 GMT+0800 (China Standard Time)

Thanks for reporting this! I will look more closely later in the day / tomorrow, but which HuggingFace transformers library version are you using?

Ioannis Tsiokos · Answer 3 · Sun Dec 06 2020 02:49:10 GMT+0800 (China Standard Time)

Should be transformers==3.4.0 as in the reqs file. I installed everything in a fresh conda env with python==3.7.

Btw, I am looking forward to the directions for training the inverse model on custom data!

Kalpesh Krishna · Answer 4 · Tue Dec 08 2020 21:25:46 GMT+0800 (China Standard Time)

I just tried running it with GPT2-small, and I can see the config.json files. Could you share the set of files you see in your checkpoint folder?

Philipp Nothvogel · Answer 5 · Mon Jan 11 2021 18:26:28 GMT+0800 (China Standard Time)

I had the same issue when training my models. It seems like there is an issue with the path in this line. Basically, when re-loading the model, the args.output_dir is used instead of the output_dir that is defined a few lines above. So this points to the parent folder of all the checkpoints instead of the folder with the last checkpoint.

I haven't tested if this fixes the problem, but I will try it for my next run on the cluster.

Philipp Nothvogel · Answer 6 · Tue Jan 19 2021 23:00:15 GMT+0800 (China Standard Time)

Just to follow up: Changing the line mentioned above did fix the error. Just make sure that --do_eval is set and that you are not using do_delete_old. This way the best, i.e. lowest validation perplexity, checkpoint will be copied to the output dir / parent folder of all the checkpoints after training is finished.

Guanqun Yang · Answer 7 · Thu Aug 19 2021 05:22:11 GMT+0800 (China Standard Time)

@martiansideofthemoon Just curious, how could I also load gpt2-small as you did? It seems that this is not offered in the HuggingFace model hub.

Kalpesh Krishna · Answer 8 · Thu Aug 19 2021 09:40:13 GMT+0800 (China Standard Time)

@guanqun-yang you can just use gpt2 offered on HuggingFace (https://huggingface.co/gpt2)