Eric-Wallace / universal-triggers

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loss thresholds for successful triggers on language models?

mathemakitten opened this issue · comments

commented

Hi Eric! Thanks for sharing this work. I've implemented this in Tensorflow to use with a dupe of the 124M GPT-2 model and was wondering if you could provide some details on the range of final "best loss" #s you were seeing with the smallest model and the triggers which worked (I'm working under the assumption that on a vocab size of 50k that cross entropy of ~10.8 ish would be equivalent to "random"). My current process isn't producing triggers which are successfully adversarial and I'm wondering if perhaps I'm just not finding very good triggers. Thanks!

Hey sorry I never responded! Did you figure out the issue?

Feel free to reopen the issue if not @mathemakitten

commented

No worries, I ended up pulling your repo to find out. Ended up reproducing this in Tensorflow 2.3 so could add support for that if it's of interest!