BugTossing

Dataset link: https://drive.google.com/file/d/17XXG75zmR3_bDeWKDNeXYNVL8ptAzUo3/view?usp=sharing Fasttext Model link: https://fasttext.cc/docs/en/pretrained-vectors.html (a. download the English wiki.en.bin model b. Under root directory, construct model\wiki.en directory and c. put the wiki.en.bin model into it)

Software prepare

Neo4j: as the database to save the Bug Tossing Graph
Gephi: Use the community detection algorithm to get the modularity class of the product::component (The parameters of the community detection algorithm we used are ''Randomize'' is On, ''Use edge weights'' is On and ''Resolution'' is 1.0.)

Directory prepare

Dataset prepare

Run get_product_component.py to get product_component.json (adjust the filepath according to where you put the product_component_files)
Run filter_bugs.py to get filtered_bugs.json
Run split_train_test_dataset.py to get train_bugs.json and test_bugs.json
Run generate_tossing_graph_goal_oriented_path.py to get Bug Tossing Graph (a. need to connect with Neo4j b. train_bugs only)
Run get_vec.py to get vector for text information (Note that after step 5, change the ONEHOT_DIM in config.py according to the onehot.dim from onehot = TfidfOnehotVectorizer()）
Run get_graph_feature_for_pc.py for graph features of product components

Feature vector

Change FEATURE_VECTOR_NUMS_PER_FILE in config.py to (the number of product::component) * 10,000 or FEATURE_VECTOR_NUMS_PER_FILE % (the number of product::component) == 0
Run get_feature_vector.py to get the relevance label and features about text information
Run get_graph_feature_vector.py to get bug feature and features about graph
Run add_feature_vector_graph.py to merge features from step2 and step3

Model

Result

Run test_lambdaMart.py to test the model (change PRODUCT_COMPONENT_PAIR_NUM in config.py to the number of product::component)
Run change_result_format.py to change result.csv (got from test_lambdaMart.py) into a more readable format (metrics.json))
Run calculate_accuracy_ndcg.py to calculate accuracy and ndcg
Run get_mrr.py to calculate mrr
Run split_test_dataset_into_tossed_and_untossed.py to split test_bugs.json into tossed_test_bugs.json and untossed_test_bugs.json
Run split_test_feature_vector_into_tossed_untossed.py to split test_feature_vector into tossed and untossed
Reuse Result (Step 1-4) to calculate the result (accuracy, ndcg, mrr) on tossed and untossed bugs (Note that adjust "test_bugs_type" in Result (Step 1-4) python files to choose which kind of test bugs for testing)

Note that LR-BKG needs amount of memory and disk storage!!!

SuYanqi / LR-BKG