A naive bayesian classifier for bug label classification in github issues written in python3 used MapReduce model.
1. Vmware and Ubuntu (optional):
https://pan.baidu.com/s/1X29KTBNUx71GcqLc9aGB1Q 提取码:a9d5
2. jdk1.8:
https://pan.baidu.com/s/1X29KTBNUx71GcqLc9aGB1Q 提取码:a9d5
3. hadoop2.6.0:
https://pan.baidu.com/s/1ug00xUXIIvN_zyrXsRVnSw 提取码:64kb
Our code run on three virtual machine as an example, if you don't have enough real meachine or you don't know how to install hadoop, you can follow our environment setting
The data are crawled from the Microsoft vscode project issues .The original data can be found at ori_data.
We aim to predict the label (bug tag) of the issues' titles.
Then we change it into MapReduce's input format as follow:
label \t\t title
We removed some data whose label appeared less than 10 times. Then we got 1935 titles for train and 850 titles for validate.
- Clone the git to your ${Hadoop home}
- Add ${Hadoop home}/hadoop/bin to your environment path.
- Modify the path in MapReduce_code/train.sh、predict.sh
cd MapReduce_code
sh run.sh
- The result can be seen in
final_result.txt
- Attention:you may need to create the sh file by yourself due to the permission problem
Evaluation method | Result |
---|---|
rank1-Accuracy | 32.35% |
rank5-Accuracy | 73.65% |