We're finding most of the bugs, but what are we missing?

Author(s): Elaine J. Weyuker, Robert M. Bell, Thomas J. Ostrand
Venue: International Conference on Software Testing 2010
Date: 2010

Type of Experiement: Survey/Multi-Case Study
Class/Experience Level: Professional


This paper compares two different models used to predict the possibility a particular software module has a fault. Classification models make a yes or no decision while ranking models order files according to their predicted number of faults.

This paper attempts to answer the following questions about the two types of fault prediction models:
• What metrics are useful in evaluating a prediction made?
• Can the same metrics be used for both types of prediction models?
• Does the applicability of a metric change as the fault rate varies?
• What does the distribution of faults look like in low ranked files?

The intended research goal is to accurately predict the faulty modules within a system. Doing so will allow testers to best utilize their efforts. The data set analyzed in the paper comes from six separate software systems, each containing over a hundred thousand lines of code. The six systems have been collectively in use for over 35 years.

Overall, the paper concludes that nearly 80% of release faults are within the first 20% of files marked as high risk. Interestingly the paper also remarks that the remaining 20% of faults are evenly distributed over the remaining modules usually with one fault per module. Finally, the paper concluded that the files predicted to be the most faulty always are.