accuracy not dropping but trigger keeps changing

Question

accuracy not dropping but trigger keeps changing

bowtiejicode opened this issue 4 years ago · comments

Hi there, a little background on my project. I am currently doing a benign/malware app classifier based on API sequences, which can be quite similar to text classification (positive/negative).

I am running the code based on sst.py. To prepare my dataset, I followed the allennlp to create instances for train and dev data. Everything seems fine when I run the main() function, the training is done but when it comes to the trigger part, the "words" seems to be changing but accuracy is not dropping. Do you have any idea why is this happening? The same behaviour can be seen with the different attacks (e.g hotflip, nearest_neighbor_grad etc..)

Without Triggers: 0.9994070560332049
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/os/bundle;->putsparseparcelablearray, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/os/bundle;->putsparseparcelablearray, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/media/audiomanager;->adjuststreamvolume, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, lorg/apache/cordova/directorymanager;->gettempdirectorypath, landroid/hardware/sensormanager;->getsensorlist, ljava/lang/runtime;->runfinalization, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/os/environment;->isexternalstorageemulated, landroid/content/intent;->getcomponent, : 0.9997035280166024
Current Triggers: landroid/app/activity;->databaselist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/accounts/accountmanager;->getauthtoken, landroid/location/locationmanager;->removeproximityalert, landroid/os/bundle;->getparcelablearraylist, : 0.9997035280166024
Current Triggers: landroid/content/intent;->replaceextras, landroid/net/uri;->getfragment, landroid/net/uri;->getfragment, landroid/app/activitymanager;->killbackgroundprocesses, landroid/content/clipboardmanager;->getservice, landroid/os/bundle;->putparcelablearraylist, ljava/lang/system;->setsecuritymanager, : 0.9997035280166024
Current Triggers: landroid/content/res/assetmanager;->opennonassetfdnative, ljava/net/urlconnection;->getfilenamemap, ljava/net/urlconnection;->getfilenamemap, lorg/apache/xerces/impl/xmlentitymanager;->isentitydeclinexternalsubset, landroid/content/clipboardmanager;->reportprimaryclipchanged, landroid/app/fragmentmanager;->begintransaction, landroid/net/uri;->getencodedpath, : 0.9997035280166024
Current Triggers: ljava/lang/runtime;->runfinalization, landroid/net/uri;->tostring, lorg/apache/xerces/impl/xmlentitymanager;->closereaders, landroid/hardware/camera;->cancelautofocus, landroid/app/activitymanager;->getlocktaskmodestate, landroid/webkit/cookiesyncmanager;->resetsync, ljava/net/urlconnection;->getdooutput, : 0.9997035280166024
Current Triggers: landroid/content/intent;->getcomponent, landroid/bluetooth/rfcommsocket;->waitforasyncconnectnative, landroid/app/activity;->finalize, landroid/hardware/sensor;->getreportingmode, landroid/content/intent;->setdataandtype, landroid/hardware/camera;->startsmoothzoom, lorg/apache/cordova/file/directorymanager;->getfreediskspace, : 0.9997035280166024

Eric Wallace · Answer 1 · Sat Sep 26 2020 15:14:04 GMT+0800 (China Standard Time)

Hello. One thing I would start with is that the accuracy is computed on the development set and the trigger is generated using a batch of examples. So its not necessarily guaranteed to go down in accuracy each time (though what you are using is unusual). I would check if you can cause one batch to go down in accuracy. So maybe just grab a single batch and try to optimize just on that batch, and call get_accuracy() for just the batch of data. Let’s make sure that you can get it working there first

Eric Wallace · Answer 2 · Sat Sep 26 2020 15:15:32 GMT+0800 (China Standard Time)

Also, how long are your sequences? If the input is a lot of “words”, and the trigger is only a few words, maybe its not very effective. I haven’t tried it with sequences longer than SQuAD paragraphs, which are a few hundred words.

BagelTap · Answer 3 · Sun Sep 27 2020 14:19:59 GMT+0800 (China Standard Time)

Hi, my sequences are 900 in length for now, but the length that performs the best is at 1700. I am currently only using triggers < 10 words. I am trying to play around with adversarial sequence crafting, I have tried other methods that crafted out efficient adversarial samples by appending 35 extra words behind only. Now I am just attempting to try your method to see if it works. Will let you know in the upcoming days when I try more triggers.

Eric Wallace · Answer 4 · Mon Sep 28 2020 13:14:50 GMT+0800 (China Standard Time)

Great. Yeah, I'd recommend trying more words in the trigger. If the other adversarial attack methods you have tried are input-specific, and you need 35 extra words to cause the model to change its answer for them, then you will likely need at least 35 words or more for triggers to work. Recall that triggers is agnostic to the input, so it is trying to solve a much harder problem than input-specific attacks.

BagelTap · Answer 5 · Mon Sep 28 2020 19:30:25 GMT+0800 (China Standard Time)

Just curious...will the epoch affect the effectiveness of triggers too? i.e how effective are the trigger words in affecting the polarity (neg/positive)?

BagelTap · Answer 6 · Tue Sep 29 2020 15:16:39 GMT+0800 (China Standard Time)

I tried up to 100 triggers, still observing the same behaviour

Eric Wallace · Answer 7 · Wed May 12 2021 09:29:22 GMT+0800 (China Standard Time)

closing due to inactivity on my part. Feel free to reopen @limjq45 if you have further questions