rmatil / uzh-16-triaging-software-bugs

The final project in the course Business Analytics at UZH

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

uzh-16-triaging-software-bugs

The final project in the course Business Analytics at UZH: Classifying whether a bug should be fixed based on the MSR 2013 dataset

Installation

Note, that for cloning this repo, Git LFS is recommended in order to download the previously created database

Requirements

Setup

The project is already set up in so that you can run the different approaches out of the box. In case you want to update some features, note the following

  • Run setup.py in the setup directory in order to create a database storing the features as well as the bug data
  • Run extract_features.py in the setup directory in order to extract the features from the bug data. NOTE, that this will consume a lot of time (expect some hours)
  • Run sample.py in the setup directory in order to create shuffled data for training, validation and testing

Features

Note: These features are also explained in further detail in RPICase.pdf

  • Feature 1: Success Rate of a bug assignee
  • Feature 2: Success Rate of a bug reporter
  • Feature 3: Success Ratio of a bug report for every reporter-assignee pair
  • Feature 4: Success Ratio of a bug in terms of how many times it got reassigned
  • Feature 5: Number of reassignments of a bug
  • Feature 6: The duration in seconds of how long a bug was opened
  • Feature 7: Success Ratio of the component to which the bug was assigned
  • Feature 8: Success Ratio of a bug considering the reporter and all names on the CC
  • Feature 9: Success Ratio of the version to which the bug was assigned
  • Feature 10: Success Ratio of a bug depending whether it is classified as user interface, environment or network related

Models

In order to apply the models to the different features, run run_models.py within the model directory. You get as output the accuracy as well as the f1 score for each model on the test data.

Interpreting the prediction results: 1 represents, that a bug was fixed (i.e. successfully closed), 0 that it will not be fixed

Results

ROC curves for different models Cross Validation Scores on linear models

Predictions on Hold-out dataset

Predicitions made by our best classifier are written to predictions.csv. As first column you get the Bug ID, as second column the indicator whether a bug is predicted as successfully closed.

About

The final project in the course Business Analytics at UZH

License:MIT License


Languages

Language:Python 85.3%Language:TeX 14.7%