Implications of Test-Driven Development: A Pilot Study

Author(s): R. Kaufmann and D. Janzen
Venue: ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications
Date: 2003


Software Engineers in the field claim that Test-Driven Development is better for “faster-debugging, greater reliability, and increased confidence, and superior design”. However, there isn’t good enough evidence to support these claims with statistical confidence.

A small experiment was performed at Bethel College in Spring of 2003. The study examined the effects on software quality, programmer productivity, and programmer confidence. The study consisted of two groups of four people each. One group produced the project in a test-first manner, and the other in a test-last. The students ranged from sophomore to senior standing. However, all the students were Computer Science undergraduates with C++ as their major programming language and at least two semesters of programming courses. In the study, the students used Java as the development language to create a graphical game application. The quality was measured with McCabe’s Cyclomatic Complexity number and other tests. One of the metrics for productivity was the number of non-commented lines of code.

The test-first group wrote 50% more code, but had a more complex game to implement. Nevertheless, they still produced more code in the given time frame. Over the project development time, the test-last group reduced the number of classes they had while increasing the number of methods, leading to poor design. Furthermore, the test-last group had a class with more than twice the information flow measure (square of fan-in and fan-out) than any other class in either project, which also indicated bad design. After conducting a survey on confidence of project functionality between 1 being least confident and 5 being most confident, the test-last group averaged a 2.5 and test-first group a 4.75. The test-first group also scored a 4.25 when asked about how test-first helps with debug and design.

With such a small sample size and project size, no conclusions should or are drawn from this experiment, other than the few statistics here and there should lead to more complex studies to confirm results in the future. It would be inappropriate to say the success of the test-first project was due to TDD, since neither group wrote nearly enough test cases. Thus, the test-first group didn’t properly carry out TDD to begin with.