DPMiner is an integrated framework that can collect various types of data required for defect prediction through a single program.
A list of repository URLs matching the conditions desired by the user is extracted from the version control system and the open source repository, GitHub.
To extract the URL list, DPminer use Search API among GitHub REST APIs. Search API provided by GitHub can receive a list of 100 repository URLs per page by sending information about conditions in query format. This framework can collect all of the project repository URLs corresponding to the condition by collecting a list of repository URLs for several queries.
Possible conditions
- commit Count Base
- recent Date
- fork Number
- language Type
- author Token
The patch is function to collects bug fixing commit(BFC). There are three ways to collect bug fixing commit(BFC)
-
Jira
Jira is a repository for managing issues. Jira manages the project with a label indicating the nature of the issue and status information, which is the progress of the issue. DPMiner collects commit hash whose label is bug and progress status is Close. Find Jira key example -
GitHub Issue
GitHub provides an issue function for efficient project management. GitHub helps manage version upgrades, defect detection, and feature enhancements by assigning issue. And the status of the issue is marked as open or closed. DPMiner collects data by considering the issue is a bug and the state is closed as BFC. -
Commit message
Commit messages are recorded using keywords important to each commit for developers to efficiently maintain and collaborate. If there are "bug" and "fix" keywords in the commit message, that commit considers as BFC. DPMiner collects commit hash whose commit message have "bug" and "fix" keyword.
After collecting BFC (Bug Fix Commits) by the method described in Patch, BIC (Bug Introducing Commits) is collected by using SZZ algorithm. In this framework, two SZZ algorithm are used.
-
B-SZZ The B-SZZ algorithm is an algorithm that finds the commit that introduced the bug by executing git blame on the modified line of the commit that fixed the bug. It is a basic szz algorithm.
-
AG-SZZ The AG-SZZ algorithm uses Annotation Graph to correct blank lines, format changes, comments, and remove outlier BFCs that modify too many files at once. The annotation graph is created from the first commit to the commit that contains the defect correction information, and then the DFS algorithm is applied to the line where the defect is corrected to find the line causing the defect.
The metric is information of source code for defect prediction.
-
Characteristic Vector
Characteristic Vector is a metric representing the structural change of the source code. -
Bag of Words
Bag of Words is a metric that measures the frequency of occurrences of words after breaking up sentences into word units in source code and commit messages. -
Meta data
Meta data consists of 25 types of data such as modified lines and added lines.
$ ./gradlew distZip
or
$ gradle distZip
After the command, unzip "build/distributions/DPMiner.zip"
The executable file is in build/distributions/DPMiner/bin
There are two executable files. One is DPMiner.bat, the other is DPMiner.
Window use DPMiner.bat, Linux or Mac OS use DPMiner.
If you have trouble to build using gradlew, enter
$ gradle wrap
Option | Description |
---|---|
-i* |
input path |
-o* |
output path |
- * :
-i
and-o
are required.
Command : findrepo
Option | Description | usage |
---|---|---|
-c |
create Date | -c 2019-01-01..2020-01-15 |
-cb |
commit Count Base | -cb less500 -cb over500 |
-d |
recent Date | -d 2019-01-01..2020-06-30 |
-f |
fork Num | -f 10..200 |
-l |
language Type | -l java |
-auth* |
auth Token | -auth "Auth Token" |
-o* |
output path | -o /Users/Desktop/repository |
- * :
-auth
and-o*
are required.
findrepo -o /Users/Desktop/repository -l java -auth "Auth Token"
findrepo -o /Users/Desktop/repository -c 2019-01-01..2020-01-15 -f 10..200 -auth "Auth Token"
findrepo -o /Users/Desktop/repository -d 2019-01-01..2020-06-30 -cb over500 -auth "Auth Token"
Command : patch
Option | Option | ||
---|---|---|---|
-ij |
jira url | -jk* |
jira keyword |
-ik |
commit message | -k |
bug keyword (default : bug,fix) |
-ig |
github issue | -l |
issue bug label (default : bug) |
- One of
-ij
,-ik
and-ig
is mandatory - * :
-jk
is required when using option-ij
.
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key"
patch -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
patch -i https://github.com/apache/camel-quarkus -o /Users/Desktop/camel-quarkus/patch -ig
patch -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik -k help
Command : bic
(Same with patch option table)
SZZ Option | Description |
---|---|
-z BSZZ |
Git Blame (default) |
-z AGSZZ |
Annotation Graph |
-z
option is not required.
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z "SZZ Mode"
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
bic -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
bic -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z BSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z BSZZ
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z AGSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z AGSZZ
Command : metric
Option | Description |
---|---|
-bp* |
bic csv file path |
- The metric can only be collected using file BIC_BSZZ.csv
//metric -i "Github URL" -o "local directory path"/metric -bp "BIC file path"/BIC_BSZZ_"ProjectName.csv"
metric -i https://github.com/apache/juddi -o /Users/Desktop/metric -bp /Users/Desktop/BIC_BSZZ_juddi.csv