FuzzTuning

code for our acl 2023 paper Understanding Programs by Exploiting (Fuzzing) Test Cases

Requirements

torch 1.7.0

transformers 4.18.0

Pretrained Model

Download codebert-base and unixcoder-base, and move the files to ./microsoft/codebert-base and ./microsoft/unixcoder-base

Clone Detection (POJ-104)

Task Definition

Given a code and a collection of candidates as the input, the task is to return Top K codes with the same semantic. Models are evaluated by MAP@R score. MAP@R is defined as the mean of average precision scores, each of which is evaluated for retrieving R most similar samples given a query. For a code (query), R is the number of other codes in the same class, i.e. R=499 in this dataset.

Dataset

We take POJ-104 dataset on this task as an exmaple.

Download and Preprocess

Download POJ dataset from programs.tar.gz and download POJ fuzzing data from POJ104.io.tar.gz.

mv programs.tar.gz POJ104.io.tar.gz ./dataset
cd dataset
tar zxvf programs.tar.gz
tar zxvf POJ104.io.tar.gz
mv fuzz output
python preprocess_clone.py
cd ..

Data Format

After preprocessing dataset, you can obtain three .jsonl files, i.e. train_fuzz_clone.jsonl, valid_fuzz_clone.jsonl, test_fuzz_clone.jsonl.

The processed .jsonl files are also at ./dataset.

For each file, each line in the uncompressed file represents one function. One row is illustrated below.

code: the source code
label: the number of problem that the source code solves
index: the index of example
fuzz: the fuzzing test cases including input and output pair

Data Statistics

Data statistics of the dataset are shown in the below table:

	#Problems	#Examples
Train	64	32,000
Dev	16	8,000
Test	24	12,000

Evaluator

We use the script provided from website to evaluate predictions for this task, and report MAP@R score.

Pipeline-FuzzTuning-Clone

We also provide a pipeline of FuzzTuning on this task in ./clone/run.sh.

Code Classification (POJ-104)

Task Definition

The code classification task requires that we assign the same label to programs that were implemented to solve the same problem and achieve the same goal. Models are evaluated by accuracy.

Dataset

We take POJ-104 dataset on this task as an example.

Download and Preprocess

The process for dataset is similar with clone detection unless

python preprocess_cls.py

Data Format

After preprocessing dataset, you can obtain three .jsonl files, i.e. train_fuzz_cls.jsonl, valid_fuzz_cls.jsonl, test_fuzz_cls.jsonl.

The processed .jsonl files are also at ./dataset.

For each file, each line in the uncompressed file represents one function. One row is illustrated below.

code: the source code
label: the number of problem that the source code solves
index: the index of example
fuzz: the fuzzing test cases including input and output pair

Data Statistics

Data statistics of the dataset are shown in the below table:

	#Examples
Train	31,338
Dev	10,294
Test	10,368

Pipeline-FuzzTuning-Cls

We also provide a pipeline of FuzzTuning on this task in ./cls/run.sh.

Reference

Please cite our work in your publications if it helps your research:

@article{zhao2023understanding,
  title={Understanding Programs by Exploiting (Fuzzing) Test Cases},
  author={Zhao, Jianyu and Rong, Yuyang and Guo, Yiwen and He, Yifeng and Chen, Hao},
  journal={arXiv preprint arXiv:2305.13592},
  year={2023}
}

rabbitjy / FuzzTuning

FuzzTuning

Requirements

Pretrained Model

Clone Detection (POJ-104)

Task Definition

Dataset

Download and Preprocess

Data Format

Data Statistics

Evaluator

Pipeline-FuzzTuning-Clone

Code Classification (POJ-104)

Task Definition

Dataset

Download and Preprocess

Data Format

Data Statistics

Pipeline-FuzzTuning-Cls

Reference

About

Languages