Mapping Bug Reports to Relevant Files: A Ranking Model, a Fine-Grained Benchmark, and Feature Evaluation

Author(s): Xin Ye, Razvan Bunescu, Chang Liu
Venue: IEEE Transactions on Software Engineering
Date: 2016

Type of Experiement: Other
Data Collection Method: Code Metric


Bug reports play a major role in the development and maintenance of large software projects as they document bugs and provide developers information for eliminating them. However, developers often have to reproduce the bug and identify source files likely to contain the bug, a process that becomes nontrivial in large projects, especially when one is not highly familiar with the project’s source code and the diversity and quality of bug reports vary. The authors approach the problem of identifying these source files as a ranking problem which uses features provided by process metrics (information about the change history of the project) as part of machine learning algorithm. Central to the author’s ranking model is the use of the project’s API as a bridge to connect the natural language used in the bug reports to the technical (and often language-specific) terms within the software system.

The authors provide a formal description of their ranking model, describe the selection of the training and test sets, detail the selection of features used in training their model, and evaluate the effectiveness of the chosen features. A greedy backward elimination algorithm for feature selection was used to analyze the contribution that each feature makes to training the model and it was found that all (19) of the selected are necessary for the most accurate results. To test the effectiveness of their tool, the authors did experimental evaluations on six Java projects which reveal that their approach can locate the relevant files within the top 10 recommendations for over 70 percent of the bug reports in the Eclipse Platform and Tomcat projects. The runtime is relatively high for this tool, but the authors justify this by stating that this only happens once for any one project, and that all the source files are indexed and only modifications are considered from that point onward.