Learning Mistakes in Boundary Conditions: A First Exploration

Description: 📰

Mistakes in boundary conditions are the cause of many bugs in software. These mistakes happen when, e.g., developers make use of '<' or '>' in cases where they should have used '<=' or '>='. Mistakes in boundary conditions are often hard to find and manually detecting them might be very time-consuming for developers. While researchers have been proposing techniques to cope with mistakes in the boundaries for a long time, the automated detection of such bugs still remains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes in boundary conditions, it should be able to capture the overall context of the source code under analysis. In this work, we propose a deep learning model that learn mistakes in boundary conditions and, later, is able to identify them in unseen code snippets. We train and test a model on over 1.5 million code snippets, with and without mistakes in different boundary conditions. Our model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41 real-world bugs; however, with a high false positive rate. The existing state-of-the-practice linter tools are not able to detect any of the bugs. We hope this paper can pave the road towards deep learning models that will be able to support developers in detecting mistakes in boundary conditions.

Data Collection: 💾

Raw Java large dataset was used from Code2Seq
Preprocessed dataset available from Our Google Drive

Setting Up the Environment: 📋

Install conda
Create a new environment:

conda env create -f environment.yml

Update environment:

conda env update --file environment.yml --prune
No GPU support? Then replace

tensorflow-gpu==2.0.0-rc1 with tensorflow==2.0.0-rc1.

Extracting Custom Code2Vec Model From the Original Weights:

Download the pre-trained model.
Make sure this is the trainable model.
Unzip this folder in the resources folder in project root.
Download & unzip custom_model.zip to resources/models

Run main.py. The custom model can now be used in any tf graph.

Running the model on a single java file

Assuming you have setup the anaconda environment, you have downloaded the pre trained model and stored it at resources/models/custom3/model and you have a java file named Input.java. Then you can run the model using the following comand: python main.py --weights resources/models/custom3/model --input Input.java

Running the model on a java project.

Create a JAR as instructed in JavaExtractor folder
If there is not one, create a folder called data in project root
Change paths in the command in next step
Run the command in data folder java -cp ~/path/to/project/JavaExtractor/JPredict/target/JavaExtractor-0.0.2-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --max_contexts 200 --dir path/to/java/project/to/test/ --evaluate > evaluate.txt
If you have extracted the model properly in previous steps you can run evaluate.py

Training the model

Before we can start the training process we need to encode the dataset into a numpy format. This can be done using the following command: python encode_data_set.py --dataset <path_to>/java-large.txt --output <path_to_data_folder>/ --prefix <train, val, or test>. This will encode the dataset into numpy by createing the following files path_source_token_idxs.npy, path_idxs.npy, path_target_token_idxs.npy, context_valid_masks.npy and Y.npy. We need to do this for both the training and validation set since both are needed to train a model.
Once we have encoded the training and validation set we can train the model using the following command: python train.py --trainset <path_to_data_folder>/<train_prefix> --valset <path_to_data_folder>/<val_prefix> --batch_size 1024> --pre_trained_weights <optional path_to_code2vec model> -freeze <True or False> --output <path_to_weight_output_folder>. Note that <path_to_data_folder>/<prefix> must be the same for the all the 5 created files in the encoding step as the this script loads them all automatically.

Testing the model.

The model can be tested in 2 ways.

Using the command: validation_on_testset.py --weights <path_to_model_weights_folder> --dataset <path_to_data_folder>/<test_prefix> --threshold 0.5 --batch_size 1024. This can be used to test the model against a unseen test set. This script will output the confusion matrix, test_loss, accuracy, f1_score, precision_score and recall_score.
Using the command: calc_prediction_stats.py --weights <path_to_model_weights_folder> --dataset <path_to_data_folder>/<test_prefix> --threshold 0.5 --batch_size 1024 --output <path>/stats.csv. This can be used to calculate the TP, TN, FP and FN per off-by-one type. The result will be stored in the stats.csv file.

SERG-Delft / ml4se-offside