AutoPruner

This repository contains source code of research paper "AutoPruner: Transformer-based Call Graph Pruning", which is published at ESEC/FSE 2022

@inproceedings{le2022autopruner,
  title={AutoPruner: transformer-based call graph pruning},
  author={Le-Cong, Thanh and Kang, Hong Jin and Nguyen, Truong Giang and Haryono, Stefanus Agus and Lo, David and Le, Xuan-Bach D and Huynh, Quyet Thang},
  booktitle={Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  pages={520--532},
  year={2022}
}

Structure

The structure of our source code's repository is as follows:

config: contains our experimental configurations;
script: contains script for running our experiments;
src: contains our source code.
- finetune: contains source code for fine-tuning phase
- training: contains source code for training phase
- utils: contains source code for utility functions, e.g., logger, visualization, ...
- gnn: contains source code for gnn benchmark
- Note that, for each sub-folder in this folder, main.py, dataset.py, model.py contains the source code of training/testing, dataset processing and deep learning models, respectively;
environment.yml: contains the configuration for AutoPruner's enviroment.

The structure of our data's repository is as follows:

dl_dataset: contains our processed dataset for AutoPruner;
gnn_dataset: contains our processed dataset for GNN benchmark;
gnn_model: contains our trained models for GNN benchmarks;
info_data: contains the lists of training and testing programs;
model: contains our trained models for AutoPruner;
npe_result: contains the results of manual evaluation for Null-pointer analysis;
processed_data: contains extracted source code for methods in programs in cgPruner's dataset
raw_data: contains the static call graphs generated by static analysis tools from cgPruner

Requirements

Hardware

More than 200GB disk space
2 NVIDIA GPU that CUDA 11.3; supports and have at least 8GB memory.

Software

Ubuntu 18.04 or newer
Docker/Conda

Environment Configuration

Conda

conda env create -n autopruner --file environment.yml

Docker

For ease of use, we also provide a installation package via a docker image. User can setup AutoPruner's docker step-by-step as follows:

Pull AutoPruner's docker image:

docker pull thanhlecong/autopruner:v2

Run a docker container:

docker run --name autopruner -it --shm-size 16G --gpus all thanhlecong/autopruner:v2

Activate conda:

source /opt/conda/bin/activate

Activate AutoPruner's conda enviroment:

conda activate autopruner

Note that, the source code of AutoPruner are stored at /workspace/ in Docker. So, please move to this folder before running experiments.

Experiments

To use our tool, please use the following command

python3 -m src.training.main --config_path [config path]
                             --mode [mode: test or train] 
                             --feature [type of features: 0: structure, 1: semantic, 2:combine] 
                             --model_path [path to saved model (for saving in train mode and loading in test mode)]

To replicate the result of AutoPruner, please down the data from this link and put in the same folder with this repository, then run following below instructions. Note that, our results may be slightly different when running on different devices. However, this diffences does not affect our findings in the paper.

RQ1

To replicate the result of AutoPruner in call graph pruning on Wala (RQ1), please use

bash script/rq1_wala.sh

To replicate the result of AutoPruner in call graph pruning on Doop (RQ1), please use

bash script/rq1_doop.sh

To replicate the result of AutoPruner in call graph pruning on Petablox (RQ1), please use

bash script/rq1_peta.sh

RQ2

Null-pointer Analysis

In this analysis, we follow the experimental settings of cgPruner including their code of Null-pointer Analysis (NPA). Please refer to cgPruner's replication package for further instructions. You also can find our manual evaluation in npe_result folder in this link

Monomorphic Call-site Detection

To replicate the result of AutoPruner in monomorphic call-site detection on Wala's call graph (RQ1), please use

bash script/rq2_wala.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Doop's call graph (RQ1), please use

bash script/rq2_doop.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Petablox's call graph (RQ1), please use

bash script/rq2_peta.sh

RQ3

To replicate the ablation study of AutoPruner with strutural features, please use

bash script/rq3_structure.sh

To replicate the ablation study of AutoPruner with semantic features, please use

bash script/rq3_semantic.sh

To replicate the ablation study of AutoPruner with caller function, please use

bash script/rq3_caller.sh

To replicate the ablation study of AutoPruner with callee function, please use

bash script/rq3_callee.sh

TVAO / AutoPruner