Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found #6

Open
ioannist opened this issue Dec 3, 2020 · 8 comments

Comments

@ioannist
Copy link

ioannist commented Dec 3, 2020

I tried training the paraphraser with gpt2 (small) as the large model would not fit my 1080 Ti. Everything went alright until the last iteration, where I got the error below. The final checkpoint seems to have been saved successfully. However, python tries to read from

file style_paraphrase/saved_models/test_paraphrase/config.json

which was not created and does not exist. All config.json files are inside their respective checkpoint folders.

12/03/2020 18:22:39 - INFO - __main__ -    global_step = 21918, average loss = 1.8063476276852939
12/03/2020 18:22:40 - INFO - __main__ -   Saving model checkpoint to style_paraphrase/saved_models/test_paraphrase/checkpoint-21918
Traceback (most recent call last):
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 369, in get_config_dict
    local_files_only=local_files_only,
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/file_utils.py", line 957, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file style_paraphrase/saved_models/test_paraphrase/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "style_paraphrase/run_lm_finetuning.py", line 507, in <module>
    main()
  File "style_paraphrase/run_lm_finetuning.py", line 437, in main
    tokenizer_class=tokenizer_class)
  File "/home/ioannis/Desktop/style-transfer-paraphrase/style_paraphrase/utils.py", line 51, in init_gpt2_model
    model = model_class.from_pretrained(checkpoint_dir)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/modeling_utils.py", line 876, in from_pretrained
    **kwargs,
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 329, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/transformers/configuration_utils.py", line 382, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'style_paraphrase/saved_models/test_paraphrase'. Make sure that:

- 'style_paraphrase/saved_models/test_paraphrase' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'style_paraphrase/saved_models/test_paraphrase' is the correct path to a directory containing a config.json file


Traceback (most recent call last):
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/ioannis/anaconda3/envs/style-transfer-paraphrase/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ioannis/anaconda3/envs/style-transfer-paraphrase/bin/python', '-u', 'style_paraphrase/run_lm_finetuning.py', '--local_rank=0', '--output_dir=style_paraphrase/saved_models/test_paraphrase', '--model_type=gpt2', '--model_name_or_path=gpt2', '--data_dir=datasets/paranmt_filtered', '--do_train', '--save_steps', '500', '--logging_steps', '20', '--save_total_limit', '-1', '--evaluate_during_training', '--num_train_epochs', '3', '--gradient_accumulation_steps', '2', '--per_gpu_train_batch_size', '5', '--job_id', 'paraphraser_test', '--learning_rate', '5e-5', '--prefix_input_type', 'original', '--global_dense_feature_list', 'none', '--specific_style_train', '-1', '--optimizer', 'adam']' returned non-zero exit status 1.

@ioannist
Copy link
Author

ioannist commented Dec 3, 2020

Same problem with run_finetune_shakespeare_0.sh, btw, when training with gpt2 (small)

@martiansideofthemoon
Copy link
Owner

Thanks for reporting this! I will look more closely later in the day / tomorrow, but which HuggingFace transformers library version are you using?

@ioannist
Copy link
Author

ioannist commented Dec 5, 2020

Should be transformers==3.4.0 as in the reqs file. I installed everything in a fresh conda env with python==3.7.

Btw, I am looking forward to the directions for training the inverse model on custom data!

@martiansideofthemoon
Copy link
Owner

I just tried running it with GPT2-small, and I can see the config.json files. Could you share the set of files you see in your checkpoint folder?

@philno
Copy link
Contributor

philno commented Jan 11, 2021

I had the same issue when training my models. It seems like there is an issue with the path in this line. Basically, when re-loading the model, the args.output_dir is used instead of the output_dir that is defined a few lines above. So this points to the parent folder of all the checkpoints instead of the folder with the last checkpoint.

I haven't tested if this fixes the problem, but I will try it for my next run on the cluster.

@philno
Copy link
Contributor

philno commented Jan 19, 2021

Just to follow up: Changing the line mentioned above did fix the error. Just make sure that --do_eval is set and that you are not using do_delete_old. This way the best, i.e. lowest validation perplexity, checkpoint will be copied to the output dir / parent folder of all the checkpoints after training is finished.

@guanqun-yang
Copy link

@martiansideofthemoon Just curious, how could I also load gpt2-small as you did? It seems that this is not offered in the HuggingFace model hub.

@martiansideofthemoon
Copy link
Owner

@guanqun-yang you can just use gpt2 offered on HuggingFace (https://huggingface.co/gpt2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants