zzk0/SEM

Synonyms Encoding Method (SEM)

This repository contains necessary code for reproducing main results in the paper:

We also add the code for IGA into the framework TextAttack.

There are three datasets used in our experiments:

The code was tested with:

textrnn.py,textcnn.py,textbirnn.py : The models for LSTM, Word-CNN and Bi-LSTM.
train_orig.py,train_enc.py: Training models with or without SEM.
glove_utils.py : Loading the glove model and create embedding matrix for word dictionary.
attack_utils.py : Helper functions for calculating the classification and score for the input.
build_embeddings.py : Generating the embedding matrix for original word dictionary and encoded word dictionary
improved_genetic.py : Attacking the models with or without defense by the improved genetic algorithm (IGA).

Generating the embedding matrix for original dictionary and encoded dictionary:
```
python build_embedding.py
```

Training the models with the original word dictionary:

python train_orig.py --data aclImdb --sn 10 --sigma 0.5 --nn_type textrnn

Training the models with the encoded word dictionary:

python train_enc.py --data aclImdb --sn 10 --sigma 0.5 --nn_type textrnn

To attack the models by IGA, run:

python improved_genetic.py --pre enc --sn 10 --data aclImdb --sigma 0.5 --time xxx --nn_type textrnn

This repository is under active development. Questions and suggestions can be sent to xswanghuster@gmail.com.