salesforce / ctrl

Conditional Transformer Language Model for Controllable Generation

Home Page:https://arxiv.org/abs/1909.05858

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running full model on V100 outputs last word

dimitri320 opened this issue · comments

I'm running the full model on a V100 GPU on Google Cloud, and the only output I'm getting is the last word copied over and over again. I've tried changing the temperature and topk parameters, but to no avail. I'm using the 512 version (larger version).

Any advice would be greatly appreciated.

This seems symptomatic of not providing any control code. Can you try with the first token being Links ?

Yes I have, I've tried several control code actually with both the 512 and 256 models, in both cases the results were the same. This is what I got with Links just now:

Links https://cnn.com/bill-clinton-was-the-president president president president president president president president president president

What's interesting is that the lower memory version with 512 model works perfectly well!

I'm using Google Cloud Deep Learning VM with one NVIDIA Tesla V100, 12 vCPUs, 78 GB memory and 500Gb harddrive.

Also as a note, when I do source attribution on "I lost 10 lbs! Feeling great!" with 512 model I get:

Fitness ppl = 10753944.151308

While in your example:

Fitness ppl = 36.162671

FWIW I'm running on V100 on GCP and do not have the issues you describe.

@julien-c saw your pull request, it makes perfect sense. What I don't get, is how it splits the control code from the rest of the input string, as there is no mention of Control Codes in the master branch in generation.py right now?

Let's maybe discuss on the PR itself, but AFAIK a control code is just a BPE token like any other token.

@julien-c is right. The control code is just the first token, and the way it's setup, it's always in the vocabulary so it doesn't get split up. There is no special treatment of that token during inference.

@keskarnitish Patched the correct file, and still I get the last work copied over and over again.... And I don't get the error coming up that no control word was used, as I am using control words (Links, Books, Wikipedia).

And btw, the new commit works, when I start with a non control word, it shows me the warning. Thanks for that @julien-c !

Any advice where else to look for an answer?

PS: already spent 3 days on this, really don't know what else to do...

The only other thing that comes to mind is that you might be pointing to an empty/corrupted model folder. Can you delete and re-download? Maybe also try using pytorch_generation and point to the specific .data file?

I found a solution. For TensorFlow models you need to specify the path to the model folder (not to the .data file). For PyTorch, you need to specify the path all the way to the .data file.

@keskarnitish I'd recommend to add this explicitly in the instruction for Tensorflow, as right now it's unclear.

Maybe you could add an assert to the code @dimitri320

Glad you found the issue after all though