martiansideofthemoon / style-transfer-paraphrase

Official code and data repository for our EMNLP 2020 long paper "Reformulating Unsupervised Style Transfer as Paraphrase Generation" (https://arxiv.org/abs/2010.05700).

Home Page:http://style.cs.umass.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue in running finetune paraphrase script

abhisha1991 opened this issue · comments

Hey Kalpesh and team,

Thanks very much for releasing your work - it is great to see a simple architecture like this being implemented for something novel. We're trying to just get up and running with the base set up - we have downloaded all the data and corresponding models to the right folders. However, upon running the fine tune training, we get the attached error

Our setup is a cloud VM with 1 GPU core (Nvidia Tesla T4), Ubuntu 18.04, 7.5 GB RAM, pytorch 1.10, cuda 11.5
We have confirmed pytorch is installed and available along with CUDA on the machine (see attachments)

We'd be incredibly grateful if you could release a docker image with pre-installed dependencies or tell us the exact error mode we are facing below. We're unable to proceed past this error. We're also unable to locate the error logs here (~/style-transfer-paraphrase/style_paraphrase/logs) and thus unable to understand what is wrong with our setup

error
cuda
pytorch

Hi @abhisha1991,
Unfortunately I've not encountered the error before so I'm not 100% sure the following will work. But they are still worth a try ---

  1. Try downgrading PyTorch to 1.7.. I've confirmed it works on my cluster with PyTorch 1.7 / CUDA 10.1.
  2. Try removing the DDP dependencies from the command, remove -m torch.distributed.launch --nproc_per_node=1 from the bash script. That way there will only be a single PyTorch process running the code. That way args.local_rank will be automatically set to -1. If this gives you any error, let me know
  3. The CPU RAM seems quite low (7GB), so I'm wondering if you are getting an OOM error in a child process.

Solution 2 is probably much lesser work so I suggest trying that first.

Hello @martiansideofthemoon
I hope you are fine and doing great. I am facing another problem related to the run_finetune_paraphrase.sh. I am trying to run that in Google Colab and on execution it takes a few seconds and shows that CUDA GPU is out of memory. I also have experimented with changing batch size in the sh file but it didn't work. I would appreciate your help with that.
GPU out of memory

hi @TufailAhmadSiddiq , what's the smallest batch size you tried? Reducing batch size is ok since you can do gradient accumulation to have a larger effective batch size

Thanks for the reply. The minimum batch size I have used is 2 but still facing the same problem

This is with GPT2-large? As long as batch size 1 fits, it should be ok. You can also change GPT2-large to GPT2-medium, it doesn't drop performance much. Another solution could be gradient checkpointing

Can you please point out where I should make changes to work it perfectly?

Thanks for the guidance. I do it and check whether it works or not.

Hello! Hope you are doing good. I am trying to fine tune your model on my custom dataset. When I run !style_paraphrase/examples/run_finetune_paraphrase.sh, I get the following error:
Capture1
I followed first two steps of "Custom Datasets" in this repository. At third step, while converting BPE code to fairseq binaries, "Permission denied" occurs.

@HassanBinAli i think you are missing the dataset files in the repo. Please download the train.pickle file from here and place it in datasets/paranmt_filtered/train.pickle.

Thank You. It resolved the error.