norhh / Rete

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproduction Package

This package is the reproduction kit for the program repair tool Rete. The package contains various components that help in running Rete and its combination with another tool named Trident. The package also includes tools and scripts to perform feature extraction and graph and dataset analysis.

File System Contents

The package includes the following directories and files:

  • Rete-Trident - Contains the source code for running Rete and Trident.
  • rete-feature-extractor - This directory holds the feature extraction process of Rete and another tool called Prophet for Python.
  • Scripts - Contains scripts for graph and dataset information.
  • eval - This directory has a sample example from the ManyBugs dataset, which can be used to run Trident, Rete+Trident, and another tool called SemFix.

Instructions to Run each of the Components

Each directory in the package has its own Readme.md file that provides instructions on how to run the components.

Building the Container

To build the Docker container, follow the steps below:

  1. Run the following command in your terminal:
> docker build . -t reproduction_package
  1. To go inside the Docker container, run the following command:
>docker run -v $(pwd):/home/Trident/ --rm -ti reproduction_package /bin/bash

Tools

The package provides support to run the following tools:

  1. Rete: The tool that is constructed in the research paper.
  2. Rete + Trident: A combination of Trident's specification with Rete's synthesizer and variable prioritizer.
  3. Trident: A standalone tool.
  4. Prophet: An implementation of the tool for Python.

Installing Feature Extractor

To install the feature extractor, follow the steps below:

  1. Go to the infra directory:
> cd infra/
  1. Build the Docker container in the infra directory:
> docker build -t rete/ubuntu-16.04-llvm-3.8.1 .
  1. Change to the parent directory and build the Docker container:
> cd ..
> docker build -t rete-feature .

Random Forest Features

The package also includes a feature extraction process that is based on random forest. This process involves counting the individual features for each variable, choosing the variables with the maximum count, and ignoring the rest. The variable identification is done based on the order of variables' features fed to the random forest. If the random forest outputs 7, it means that the 7th variable is the correct one.

Here is the list of the features used by the Random Forest:

  1. def: count of variable definitions 
  2. use: count of variable uses
  3. for_init: count of 'for' loop initialisation uses
  4. for_cond: count of 'for' loop condition uses
  5. for_lcv: count of loop control uses
  6. while_cond: count of 'while' loop condition uses
  7. if_cond: count of 'if' condition uses
  8. hole_to_def: distance between hole and the def
  9. last_use: distance to last use 
  10. hole_window: count of uses in k lines around hole 
  11. operator_histo: multiset of counts of operator/function uses
  12. is_global: local/global variables

       

Running

Run the docker container by first mounting your current directory into "/tmp" to "/bin/bash"

> docker run -v $(pwd):/tmp --rm -it rete /bin/bash

You can ignore the mounting if you do not need the current directory. You can either compile inside docker

after mounting or directly run the build code present in "/rete" folder.

> cd /tmp/rete

> cmake .. -DF1X_LLVM=/llvm-3.8.1

> make

> chmod u+x rete

To extract a json of feature information.

> ./rete -output="<path>"

To extract intermediate CDU chain data.

> ./rete -get-chain-data -output="<path>"

Python-Prophet

To extract Prophet's Features ().

> python3 rete-feature-extracter/learning/prophet.py extract-features <file_path> --output-file <output_file_path>

To extract Prophet's Feature vector () from Prophet's features ().

> python3 rete-feature-extracter/learning/prophet.py feature-vector --buggy buggy_file.pcl --correct correct_file.pcl --mod-kind <MOD_TYPE> --line-no <line_no>

Rete-Trident

Build and run a container:

docker build -t rtrident . docker run --rm -ti rtrident /bin/bash

Build runtime:

cd runtime KLEE_INCLUDE_PATH=/klee/include make

Run examples:

./tests/assignment/run ./tests/iterations/run ./tests/multipath/run ./tests/simple-rvalue/run

Synthesizer interface

The synthesizer supports the following functionality:

Verifying a given patch Generating a patch Generating all patches As specification, the synthesizer uses KLEE path conditions generated by Trident runtime, conjoined with test assertions.

To verify a given patch, run the following command:

> python3.6 /path/to/synthesis.py --tests \ <ASSERTION_SMT_FILE>:<KLEE_OUTPUT_DIR> ... \
--components <COMPONENT_SMT_FILE> ... \
--verify <LOCATION ID>:<PATCH_FILE> ...
--templates <Template_Path> \
--depth <int> \
--model <model> \
--theta <int> 

If the template path is specified then the synthesizer uses Rete's Plastic surgery based synthesis using the templates extracted from the codebase. If the path is not mentioned synthesizer defaults to naive enumeration.

The names of some files are important: the names of component files are components IDs, the names of assertion files are test IDs.

Patch file can be either an SMT file with patch semantics (same as components without holes), or JSON file that describes a tree of components and a valuation of constants. Here is an example of such JSON file:

{

"tree": {

"node": "less-or-equal",

"children": {

"left": {

"node": "x"

},

"right": {

"node": "constant_a"

}

}

},

"constants": {

"a": 2

}

}

The above JSON describes the expression "x <= 2". Specifically, it encodes the tree of components "less-or-equal[left=x, right=constant_a]", and specifies that the constant "a" equals 2. Note that "a" does not refer to "constant_a" ("constant_a" is just a component ID), but to the variable "const_a" used in the definition of constant_a component.

To generate a patch, run the following command:

> python3.6 /path/to/synthesis.py --tests <ASSERTION_SMT_FILE>:<KLEE_OUTPUT_DIR> ... \
--components <COMPONENT_SMT_FILE> ...

Add --all to generate all patches, and --depth <DEPTH> to generate expressions with the length from the root component to the leaves being at most DEPTH.

Components are defined as SMT formulas with special variables and rules how these variables can be used. Standard components are defined in the "components" directory. Note that these components do not include variables, since variables are project-specific.

Component definitions use special variables that can be classified as inputs and outputs. Component inputs include:

Constant symbol (const_<NAME>) - the value of a constant parameter (to be found by SMT solver) Rvalue variable reference (rvalue_<NAME>) - the value of a program variable (passed to trident runtime) Rvalue hole (rhole_<NAME>) - the value of a subtree of the current component Component outputs include:

Rvalue return (rreturn) - the value of the tree rooted at the current component Lvalue return (lreturn) - the value of the tree rooted at the current component after it is assigned Lvalue variable reference (lvalue_<NAME>) - the value of a program variable (passed to trident runtime) after it is assigned Lvalue hole (lhole_<NAME>) - the value of a subtree of the current component after it is assigned The rules for using these variables are the following:

Each component should have an rreturn or an lreturn or both The value of each output should be always determined by the values of inputs. Constants and variable references are global, i.e. if the same constant or variable name is used in two components, these will be the same constants/variables. However, the names of holes are local to each component. The following is the correct definition of guarded assignment component:

(declare-const lhole_left (_ BitVec 32)) (declare-const rhole_left (_ BitVec 32)) (declare-const rhole_right (_ BitVec 32)) (declare-const rhole_condition Bool) (declare-const rreturn (_ BitVec 32)) (assert (and (= rreturn rhole_right) (ite rhole_condition (= lhole_left rhole_right) (= lhole_left rhole_left))))

The above specifies that

(= rreturn rhole_right) - the statement as a whole outputs the value of the subtree "right" (= lhole_left rhole_right) - if the condition is true, the value of the subtree "left" will be updated to the value of the subtree "right" (= lhole_left rhole_left) - if the condition is false, the value of the subtree "left" will remain as it was originally The following definition is incorrect, since it only updates x when it is positive, but does not specify it when it is negative:

(declare-const rvalue_x (_ BitVec 32)) (declare-const lvalue_x (_ BitVec 32)) (declare-const rreturn (_ BitVec 32)) (assert (and (= rreturn 0) (=> (> rvalue_x 0) (= lvalue_x 1))))

About


Languages

Language:C 44.0%Language:LLVM 40.7%Language:M4 4.0%Language:Shell 3.9%Language:SMT 2.5%Language:Python 1.3%Language:Perl 1.3%Language:C++ 0.8%Language:TeX 0.7%Language:Makefile 0.5%Language:Yacc 0.2%Language:CMake 0.1%Language:Logos 0.0%Language:Awk 0.0%Language:Common Lisp 0.0%Language:Dockerfile 0.0%Language:CSS 0.0%Language:Java 0.0%Language:RPC 0.0%Language:Roff 0.0%Language:sed 0.0%Language:Jupyter Notebook 0.0%