oneweek-hi / DPMiner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DPMiner : Mining Repository Tool

DPMiner is an integrated framework that can collect various types of data required for defect prediction through a single program.

Contents of DPMiner

What is DPMiner

1. Repository list

A list of repository URLs matching the conditions desired by the user is extracted from the version control system and the open source repository, GitHub.

To extract the URL list, DPminer use Search API among GitHub REST APIs. Search API provided by GitHub can receive a list of 100 repository URLs per page by sending information about conditions in query format. This framework can collect all of the project repository URLs corresponding to the condition by collecting a list of repository URLs for several queries.

Possible conditions

  • commit Count Base
  • recent Date
  • fork Number
  • language Type
  • author Token

2. Patch

The patch is function to collects bug fixing commit(BFC). There are three ways to collect bug fixing commit(BFC)

  • Jira
    Jira is a repository for managing issues. Jira manages the project with a label indicating the nature of the issue and status information, which is the progress of the issue. DPMiner collects commit hash whose label is bug and progress status is Close. Find Jira key example

  • GitHub Issue
    GitHub provides an issue function for efficient project management. GitHub helps manage version upgrades, defect detection, and feature enhancements by assigning issue. And the status of the issue is marked as open or closed. DPMiner collects data by considering the issue is a bug and the state is closed as BFC.

  • Commit message
    Commit messages are recorded using keywords important to each commit for developers to efficiently maintain and collaborate. If there are "bug" and "fix" keywords in the commit message, that commit considers as BFC. DPMiner collects commit hash whose commit message have "bug" and "fix" keyword.

3. BIC

After collecting BFC (Bug Fix Commits) by the method described in Patch, BIC (Bug Introducing Commits) is collected by using SZZ algorithm. In this framework, two SZZ algorithm are used.

  • B-SZZ The B-SZZ algorithm is an algorithm that finds the commit that introduced the bug by executing git blame on the modified line of the commit that fixed the bug. It is a basic szz algorithm.

  • AG-SZZ The AG-SZZ algorithm uses Annotation Graph to correct blank lines, format changes, comments, and remove outlier BFCs that modify too many files at once. The annotation graph is created from the first commit to the commit that contains the defect correction information, and then the DFS algorithm is applied to the line where the defect is corrected to find the line causing the defect.

4. Metic

The metric is information of source code for defect prediction.

  • Characteristic Vector
    Characteristic Vector is a metric representing the structural change of the source code.

  • Bag of Words
    Bag of Words is a metric that measures the frequency of occurrences of words after breaking up sentences into word units in source code and commit messages.

  • Meta data
    Meta data consists of 25 types of data such as modified lines and added lines.

How to build Gradle

 $ ./gradlew distZip 

or

 $ gradle distZip 

After the command, unzip "build/distributions/DPMiner.zip"
The executable file is in build/distributions/DPMiner/bin
There are two executable files. One is DPMiner.bat, the other is DPMiner.
Window use DPMiner.bat, Linux or Mac OS use DPMiner.

If you have trouble to build using gradlew, enter

$ gradle wrap

Options

Common options

Option Description
-i* input path
-o* output path
  • * : -i and -o are required.

1. Repository list

Command : findrepo

Option Description usage
-c create Date -c 2019-01-01..2020-01-15
-cb commit Count Base -cb less500 -cb over500
-d recent Date -d 2019-01-01..2020-06-30
-f fork Num -f 10..200
-l language Type -l java
-auth* auth Token -auth "Auth Token"
-o* output path -o /Users/Desktop/repository
  • * : -auth and -o* are required.
findrepo -o /Users/Desktop/repository -l java -auth "Auth Token" 
findrepo -o /Users/Desktop/repository -c 2019-01-01..2020-01-15 -f 10..200 -auth "Auth Token"
findrepo -o /Users/Desktop/repository -d 2019-01-01..2020-06-30 -cb over500 -auth "Auth Token"

2. Patch

Command : patch

Option Option
-ij jira url -jk* jira keyword
-ik commit message -k bug keyword (default : bug,fix)
-ig github issue -l issue bug label (default : bug)
  • One of -ij, -ik and -ig is mandatory
  • * : -jk is required when using option -ij.
Jira example
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key"
patch -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI 
Github example (-l option)
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
patch -i https://github.com/apache/camel-quarkus -o /Users/Desktop/camel-quarkus/patch -ig 
patch -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
Commit message example (-k option)
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik -k help 

3. BIC

Command : bic (Same with patch option table)

SZZ Option Description
-z BSZZ Git Blame (default)
-z AGSZZ Annotation Graph
  • -z option is not required.
Jira example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z "SZZ Mode"
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI

Github example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
bic -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
Commit message example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
bic -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik 
AG-SZZ and B-SZZ example (Jira)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z BSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z BSZZ

//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z AGSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z AGSZZ

4. Metric

Command : metric

Option Description
-bp* bic csv file path
  • The metric can only be collected using file BIC_BSZZ.csv
Metric example
 //metric  -i "Github URL" -o "local directory path"/metric -bp "BIC file path"/BIC_BSZZ_"ProjectName.csv"
metric  -i https://github.com/apache/juddi -o /Users/Desktop/metric -bp /Users/Desktop/BIC_BSZZ_juddi.csv 

About


Languages

Language:Java 99.9%Language:Makefile 0.1%