On the Effectiveness of the Test-First Approach to Programming

Author(s): H. Erdogmus, M. Morisio, and M. Torchiano
Venue: IEEE Trans. Software Eng.
Date: March 2005


When looking at the effects of test-driven development (TDD) it is easy to see how interleaving code with tests can increase program quality. Yet, it is difficult to see how it can increase productivity, as one is writing equal amounts, if not more, test code. Erdogmus and colleagues propose to investigate the strengths and weaknesses of the test-first approach.

Erdogmus and colleagues conducted a controlled experiment in the spring of 2003. Two groups were investigated. One was the control group which produced code in an iterative test-last manner, and the experimental group wrote code in an iterative test-first, or TDD style. The dependant variables in the experiment were quality and productivity. The independent variable is writing a test before production code, versus writing it afterward. To test for quality, a black-box acceptance test approach was used, consisting over 105 tests and over 350 JUnit asserts. No method stubs were provided, only high-level requirements in the form of user stories. Productivity was measured by the amount of functionality delivered per unit effort. Twenty-four students out of thirty-five completed the study. They were all junior-level Computer Science students at Politecnico di Torino. The students developed a bowling scorekeeper. There was no statistical significance in the difference of average skill and effort ratings between the groups.

The test-first group wrote 52% more tests than the test-last group, being statistically significant. There was no statistical significant difference in the quality of code between the test-first and test-last groups. Furthermore, there were no statistically significant differences between the productivity of programmers in the test-first group versus the test-last group. There were results which showed that writing more tests linearly increased the minimum quality achievable, but it was not statistically significant.

The main conclusion drawn was that the number of tests written can predict productivity independent of what type of iterative test group the student was in (p=0.001). Test-first didn’t increase quality, but did increase productivity probably because tests allow for better task understanding, better task focus, faster learning, and lower rework efforts. There were a few validity constraints in this study, including unknown level of conformity to the techniques by the students, and mono-operation bias of only testing on one project.