Introduction

Implementatino of paper "Towards Interpreting Recurrent Neural Networks through Probabilistic Abstraction" in ASE 2020.

Project structure

This project is organized as the following modules:

target_models

Functions of this module are:

Define the architecture of neural networks. In this project we use two common architectures: LSTM and GRU
Train and save models. All the trained models are saved in the data folder. More details are introduce later.

data

This module consists of three folders:

wordvec: a pretrained word2vec model is placed here.
training_data: this folder contains four datasets and their corresponding data processing code. The four datasets are listed as follows:
- bp
- tomita
- imdb
- mr
no_stopws: this folder stores all the essential data of this project. They are:
- trained_models: the trained models.
- ori_trace: the original execution trace of RNN(LSTM/GRU)
- L1_trace: the result of leve1 abstraction
- L2_results: the extracted PFAs
- adv_text: adversarial texts generated by TEXTBUGGER over IMDB and MR datasets

ori_trace_extraction

The function of this module is to extract the execution trace of LSTM/GRU for each input sequence.

level1_abstract

The function of this module is to convert the origianl trace to the symbolic trace by k-menas clustering

level2_abstract

The function of this module is to extract a PFA using the symbolic traces derived from the target model

experimnets

We organize all the code answering the research questions related to PFA.

The effectiveness folder contains the code of RQ1 and RQ2

The appliaction folder contains the code of RQ3.

baseline

This module contains the implementation of two baselines:

utils

some helper functions are organized here.

How to use?

Prepare data

word2vec model:
- Download word2vec model GoogleNews-vectors-negative300.bin
- place the model in data/wordvec folder
unzip others compressed data into the data folder
Other third-part artefacts
- refer to experiments/application/adv_detect/textbugger/universal-sentence-encoder/REDME and run experiments/application/adv_detect/textbugger/universal-sentence-encoder/download.py to download the corrpesponding artefact for adversarial texts.
- Download model checker prism
configure the following path in utils/constant.py
- PROJECT_ROOT, e.g.:
  
  PROJECT_ROOT = "/home/username/project/learn_automata_rnn"
- SENTENCE_ENCODER, e.g.,:
  
  SENTENCE_ENCODER = "/home/username/project/artefacts/universal-sentence-encoder"
- PRISM_SCRIPT, e.g.,:
  
  PRISM_SCRIPT = "/home/username/artefacts/prism/v-4.5/bin/prism"

Run

Extract PFA from scratch

Train target models. One can use the file target_models/model_training.py to train a model by herseself.
Extract original trace. One can use the file ori_trace_extraction/do_ori_extract.py to extract the execution traces of the target model.
Use the file level1_abstract/do_L1_abstract.py to convert original traces to symbolic traces.
Use the file level2_abstract/do_L2_abstract.py to extract pfa according to the symbolic traces.

Test the effectiveness

Run the file experiments/effectiveness/pfa_predict.py to get the fidelity and accuracy of the learned pfa.

Application：Detection of Adversarial Text

use experiments/application/adv_detect/craft_adversaries.py to craft adversarial texts
run experiments/application/adv_detect/ts_detect.py to detect the adversarial texts.

dgl-prc / rnn2automata