KeyError when Running the "run_lm_finetuning.py" Script

Question

KeyError when Running the "run_lm_finetuning.py" Script

BenKabongo25 opened this issue a year ago · comments

I encountered an issue while trying to run the "run_lm_finetuning.py" script from the repository. The error message indicates a KeyError, which means that a specified key was not found in the "reverse_label_dict" dictionary.

I suspect that the issue may be related to the initialization of the "reverse_label_dict" or the data used as input to the script. However, I'm not entirely sure what's causing the problem.

Any help in identifying the cause of this issue and suggesting possible solutions would be greatly appreciated. Let me know if you need any further information to investigate the problem. Thank you!

  File "/style-transfer-paraphrase/style_paraphrase/run_lm_finetuning.py", line 505, in <module>
    main()
  File "/style-transfer-paraphrase/style_paraphrase/run_lm_finetuning.py", line 417, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False)
  File "/style-transfer-paraphrase/style_paraphrase/run_lm_finetuning.py", line 68, in load_and_cache_examples
    dataset = InverseParaphraseDatasetText(
  File "/style-transfer-paraphrase/style_paraphrase/style_dataset.py", line 169, in __init__
    self.examples = limit_styles(self.examples, args.specific_style_train, split, self.reverse_label_dict)
  File "/style-transfer-paraphrase/style_paraphrase/data_utils.py", line 189, in limit_styles
    logger.info("Preserving authors = {}".format(", ".join([reverse_label_dict[x] for x in specific_style_train])))
  File "/style-transfer-paraphrase/style_paraphrase/data_utils.py", line 189, in <listcomp>
    logger.info("Preserving authors = {}".format(", ".join([reverse_label_dict[x] for x in specific_style_train])))
KeyError: 1```

Kalpesh Krishna · Answer 1 · Wed Aug 09 2023 00:34:00 GMT+0800 (China Standard Time)

Hi @BenKabongo25 which dataset were you running this on? What did you set --specific_style_train to? The value of --specific_style_train should an integer less than the number of total styles in your dataset file.

Ben Kabongo · Answer 2 · Wed Aug 09 2023 02:25:01 GMT+0800 (China Standard Time)

Thanks for your answer, @martiansideofthemoon.

I use a custom dataset.
The first small dataset I run on has 5221 items in train, 294 in test and 294 in evaluation. I therefore set the --specific_style_train parameter to 5221.
Do you think I should change it?

PS: I'm currently doing an internship with the MLIA/ISIR team and I'm working on personalized data-to-text: we want the output of our models to be close to the style of a particular user.
There is no such thing as a personalized data-to-text dataset. And to create one, I use your STRAP model.
So I'm training STRAP models on user reviews, and then I want to use the trained models to generate custom descriptions for the data-to-text.

Kalpesh Krishna · Answer 3 · Wed Aug 09 2023 03:35:14 GMT+0800 (China Standard Time)

How many different styles did you have in the training data? --specific_style_train refers to the index of the style. So if you have two styles in the training data, the only possible values for this are 0 and 1.

Ben Kabongo · Answer 4 · Fri Aug 11 2023 08:11:37 GMT+0800 (China Standard Time)

I run different scripts with numbers of styles varying between 2 and 5.
Indeed, that was the problem. I fixed it and the current scripts seem to run correctly.

Last question: does it make sense to run the same script for a single style?

Thank you very much.

Kalpesh Krishna · Answer 5 · Fri Aug 11 2023 12:05:23 GMT+0800 (China Standard Time)

Yes you can run the script for a single style as well, it will learn an inverse paraphraser for that style in isolation.