martiansideofthemoon / style-transfer-paraphrase

Official code and data repository for our EMNLP 2020 long paper "Reformulating Unsupervised Style Transfer as Paraphrase Generation" (https://arxiv.org/abs/2010.05700).

Home Page:http://style.cs.umass.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confirmation of steps and getting pickle file

UditArora2000 opened this issue · comments

Hi. So I have been trying to do the style-transfer task. I have done the following steps so far:

  • Made a custom dataset - here I added the sentences from two styles together in one text document for all train, dev, and test sets.
  • Ran the commands for the custom dataset - got the bpe and paraphrase_250 files bpe files
  • Finetuned the model - since there were two styles two models got saved

Now I am trying to transfer the style. So I am running the generation scripts (by default missing in implementation so added to the schedule.py and commented the training and eval scripts).
However for "original" text generation, the paraphraser needs a pickle file. I saw the example from paranmt_filtered dataset.

Now I have two questions:

  • Are the steps I have done so far correct? Did I miss anything?
  • How to create the pickle file for the custom dataset?

Would be really helpful if you could answer these questions.

Hi @UditArora2000, thanks for your interest in the project. Which generation scripts are you using? I've just deleted three legacy files from the repository. The recommended way to run inference is using the style_paraphrase/inference_utils.py API, see the following for examples,

  1. Generation script during final evaluation - https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/style_paraphrase/evaluation/scripts/style_transfer.py
  2. Paraphraser demo - https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/demo_paraphraser.py
  3. Web demo codebase - https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/web-demo/demo_service.py

For the custom datasets, did you try these instructions? https://github.com/martiansideofthemoon/style-transfer-paraphrase#custom-datasets

Hi! The generation scripts I am using was - style_paraphrase/run_generation_gpt2_template.sh. However, seems like the one I need is https://github.com/martiansideofthemoon/style-transfer-paraphrase/blob/master/style_paraphrase/evaluation/scripts/style_transfer.py as you pointed out.

So if I am correct this script first converts the input sentence in style S1 to intermediate from say P1 using the pre-trained paraphraser_gpt2_large. And then the fine-tuned model that we input to the script converts P1 to style S2 sentence??
Please confirm this.

So if I am correct this script first converts the input sentence in style S1 to intermediate from say P1 using the pre-trained paraphraser_gpt2_large. And then the fine-tuned model that we input to the script converts P1 to style S2 sentence??
Please confirm this.

Yes you are correct

Thanks a lot for the reply! If possible could you keep this issue open so if I have any further doubts in running the generation code, I could ask within this issue?
Also just wanted to appreciate your work and the simple but amazing intuition of your paper. A very knowledgeable read!!

Yes please feel free to keep it open for the time being.. glad you liked the work :)

Hi! So I was curious, did you try making pair-wise models. Like currently you are training the inverse paraphrase to generate the original sentence. So like if I have just two styles - S1 and S2, finetuning a model which takes a sentence in S1, paraphrases it using paraphraser_gpt2_large, and the inverse paraphrasing is done by the finetuning the model to generate S2, I think this would work better for pair-wise?
Any thoughts?

We do exactly this during inference. This is not possible during training since S1 and S2 are non-parallel. If they were parallel, I would directly train a model to perform S1 ---> S2

Closing this for now, please feel free to re-open if you have other questions!