A Critical Analysis of PSP Data Quality: Results from a Case Study

Author(s): Philip M. Johnson and Anne M. Disney
Venue: Empirical Software Engineering, 4 (4)
Date: 1998


This paper investigates the integrity of data collected from PSP and seeks to determine whether “data quality problems during collection and analysis can distort the PSP data’s representation of the programmer’s actual behavior, leading to invalid process improvement changes”. The study looks at data collected from 9 projects of 10 students. The data was manually recorded and reviewed by the instructor for errors and than checked by a computer system to verify accuracy. A total of 1539 errors were found with 46% of those being calculation errors, 18% being blank fields, and 14% involving issues of transferring values between projects.
When attempts to correct the data were made, it was found that 44% of the errors did not affect any computation beyond a single calculation. On the other hand, 34% of errors propagated into miscalculations on multiple forms for multiple projects. When the original data and corrected data were compared, for half the students miscalculations in the original data for Cost-Performance Index (planned time/actual time) indicated that they were over planning or under planning when in reality the opposite was true. Additionally, for half the students, corrections to their Yield calculations reduced their ratio by half indicating that they were removing far few bugs before compilation than indicated.

Based on the study the authors oppose the use of data collected from PSP courses as a method for evaluating the effectiveness of PSP because the collection of data is so uncontrolled and subject to errors. The authors also advocate the use of automated data collection which could eliminate 80% - 90% of the errors from manual recording.

This study is critical to future empirical studies performed on the Personal Software Process because it questions the validity of having students manually collect the data used to evaluate the process and shows that the resulting errors made by students can greatly impact the results. While the results seem to indicate that Yield and CPI values can be affected greatly by errors in collecting and analyzing the developer’s performance data, it is important to note that the correction tool used in the evaluation of the errors could have introduced systematic bias into the results.