Empirical Validation of Three Software Metric Suites to Predict Fault-Proneness of Object-Oriented ClassesDeveloped Using Highly

Author(s): Hector M. Olague, Letha H. Etzkorn, Sampson Gholston, and Stephen Quattlebaum
Venue: IEEE Transactions on Software Engineering
Date: Jun 2007

Type of Experiement: Case Study


Compared three metric suites a open-source implementation of Javascript using Java called
Rhino. It was chosen because it is an actual real-life developed software system and not
something created for a case study. The error data within the online Bugzilla repository
was used to cross-referenced with classes affected by each bug/fix. These change-lists were
retrieved for analysis on versions 14R3 and 15R1-15R5.

Compared three software metric suites: Chidamber and Kemerer’s (CK) metrics, Brito e
Abreu’s MOOD metrics, and Bansiya and Davis’s Quality Model for Object-Oriented De-
sign (QMOOD) Metrics. The MOOD and QMOOD metrics are meant to be used in early
software design. They are simple and computable at an early stage of development when
implementation of code is not necessarily readily available. The results found were that the
CK and QMOOD OO class metrics suites are useful in developing quality classification mod-
els to predict defects in both traditional and highly iterative software development processes
for both initial and multiple sequential releases.

On an individual metric level they found that CK-WMC, CK-RFC, QMOOD-CIS and
QMOOD-NOM are consistent predictors of class quality (error-proneness). Through analy-
sis they indicated that the CK metric suite produced the best models for predicting error-
proneness compared to the other software metric suites tested. They found that these soft-
ware metrics were not very accurate at low complexity and initial stages of the system. As
the system matures and complexity grows the predictability of the metrics increases. An-
other worthy note is that CK and QMOOD software metric suites were shown to be highly
correlated with each other which means that they cover the same dimensions of software
quality measurement.