thanhlecongg / drr

Tool & data on the correctness of Defects4 patches generated by program repair tools

Home Page:http://arxiv.org/pdf/1909.13694

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automated Patch Assessment for Program Repair

A tool for automatic correctness assessment for patches generated by program repair systems. We consider the human patch as ground truth oracle and use Random tests based on the Ground Truth (RGT). See Automated Patch Assessment for Program Repair at Scale

If you use this repo, please cite:

@Article{Ye2021EMSE,
    author = {Ye, He and Martinez, Matias and Monperrus, Martin},
    title = "Automated Patch Assessment for Program Repair at Scale",
    journal="Empirical Software Engineering",
    volume = "26",
    issn = "1573-7616",
    doi = "https://doi.org/10.1007/s10664-020-09920-w",
    year = "2021"
}

Folder Structure

├── Patches 257 patches from Dcorrect and 381 patches from Doverfitting
│ 
├── RGT: incl. tests from Evosuite2019, Randoop2019, EvosuitASE15, RandoopASE15 and EvosuiteEMSE18
│   
├── DiffTGen
│   ├── Results: the running result overfitting patches found by DiffTGen. 
│   ├── runDrr.py: a command to reproduce DiffTGen experiment(details see below)
│ 
├── statistics: our exerimental statistics for all RQs
│ 
└──  run.py: a command to reproduce all experiments

Prerequisites

  • JDK 1.7
  • OS: Linux and Mac
  • Configure the DEFECTS4J_HOME="home_of_defects4j"
  • Add submodule defects4j and checkout the commit 486e2b4(Please note our experiment depends on several Defects4J commands)
git submodule add https://github.com/rjust/defects4j
git reset --hard 486e2b49d806cdd3288a64ee3c10b3a25632e991

Run

To assess an indiviual patch for Defects4J:

./run.py patch_assessment <patch_id> <dataset:Dcorrect|Doverfitting> <RGT:ASE15_Evosuite|ASE15_Randoop|EMSE18_Evosuite|2019_Evosuite|2019_Randoop>  
example:  ./run.py patch_assessment patch1-Lang-35-ACS.patch Dcorrect 2019_Evosuite

To perform different sanity checks:

./run.py applicable_check
./run.py plausible_check

To identify flaky tests:

./run.py flaky_check <patch_id> <dataset:Dcorrect|Doverfitting> <RGT:ASE15_Evosuite|ASE15_Randoop|EMSE18_Evosuite|2019_Evosuite|2019_Randoop>  
example:  ./run.py flaky_check patch1-Lang-35-ACS.patch Dcorrect 2019_Evosuite

To reproduce our Expriments with RGT patch assessment

RQ1: ./run.py RQ1
RQ3: ./run.py RQ3
RQ4: ./run.py RQ4
RQ5: cd ./statistics   ./RQ5-randomness-script.py  <Evosuite2019|Randoop2019>

Results

Credits

  • For more details about Defects4J, see the original repository of the Defects4J benchmark.
  • For more details about DiffTGen, see the original repository of the DiffTGen.

About

Tool & data on the correctness of Defects4 patches generated by program repair tools

http://arxiv.org/pdf/1909.13694

License:Creative Commons Attribution Share Alike 4.0 International


Languages

Language:Java 100.0%Language:HTML 0.0%Language:JavaScript 0.0%Language:Shell 0.0%Language:Python 0.0%