Introduction
Implementatino of paper "Towards Interpreting Recurrent Neural Networks through Probabilistic Abstraction" in ASE 2020.
Project structure
This project is organized as the following modules:
target_models
Functions of this module are:
- Define the architecture of neural networks. In this project we use two common architectures: LSTM and GRU
- Train and save models. All the trained models are saved in the data folder. More details are introduce later.
data
This module consists of three folders:
- wordvec: a pretrained word2vec model is placed here.
- training_data: this folder contains four datasets and their corresponding data processing code. The four datasets are listed as follows:
- bp
- tomita
- imdb
- mr
- no_stopws: this folder stores all the essential data of this project. They are:
- trained_models: the trained models.
- ori_trace: the original execution trace of RNN(LSTM/GRU)
- L1_trace: the result of leve1 abstraction
- L2_results: the extracted PFAs
- adv_text: adversarial texts generated by TEXTBUGGER over IMDB and MR datasets
ori_trace_extraction
The function of this module is to extract the execution trace of LSTM/GRU for each input sequence.
level1_abstract
The function of this module is to convert the origianl trace to the symbolic trace by k-menas clustering
level2_abstract
The function of this module is to extract a PFA using the symbolic traces derived from the target model
experimnets
We organize all the code answering the research questions related to PFA.
The effectiveness folder contains the code of RQ1 and RQ2
The appliaction folder contains the code of RQ3.
baseline
This module contains the implementation of two baselines:
utils
some helper functions are organized here.
How to use?
Prepare data
-
word2vec model:
- Download word2vec model GoogleNews-vectors-negative300.bin
- place the model in data/wordvec folder
-
unzip others compressed data into the data folder
-
Other third-part artefacts
- refer to experiments/application/adv_detect/textbugger/universal-sentence-encoder/REDME and run experiments/application/adv_detect/textbugger/universal-sentence-encoder/download.py to download the corrpesponding artefact for adversarial texts.
- Download model checker prism
-
configure the following path in utils/constant.py
-
PROJECT_ROOT, e.g.:
PROJECT_ROOT = "/home/username/project/learn_automata_rnn"
-
SENTENCE_ENCODER, e.g.,:
SENTENCE_ENCODER = "/home/username/project/artefacts/universal-sentence-encoder"
-
PRISM_SCRIPT, e.g.,:
PRISM_SCRIPT = "/home/username/artefacts/prism/v-4.5/bin/prism"
-
Run
Extract PFA from scratch
- Train target models. One can use the file target_models/model_training.py to train a model by herseself.
- Extract original trace. One can use the file ori_trace_extraction/do_ori_extract.py to extract the execution traces of the target model.
- Use the file level1_abstract/do_L1_abstract.py to convert original traces to symbolic traces.
- Use the file level2_abstract/do_L2_abstract.py to extract pfa according to the symbolic traces.
Test the effectiveness
Run the file experiments/effectiveness/pfa_predict.py to get the fidelity and accuracy of the learned pfa.
Application:Detection of Adversarial Text
- use experiments/application/adv_detect/craft_adversaries.py to craft adversarial texts
- run experiments/application/adv_detect/ts_detect.py to detect the adversarial texts.