Experiment About Test-First Programming

Author(s): M. Muller and O. Hagner
Venue: IEE Proc. Software
Date: October 2002


The main issue that Muller and Hagner were striving to solve is that nothing is concretely known about benefits of test-first methodology (by using test-driven development) versus traditional programming (done by the waterfall process). They proposed to conduct an experiment to find information on this.

Between July 2001 and August 2001, a study on two groups of Computer Science graduate students was conducted at the University of Karlsnuhe, in Karlsnuhe, Germany. One of the groups consisted of 10 students who implemented a programming assignment using test-driven development. The other group had 9 students who implemented the same program using traditional, waterfall, methodology. The TDD group was the experimental group and the Waterfall group was the control group. They had to implement the main class of a graph library which contained only the method declarations but not he method bodies, which for the assignment, the students then had to implement. The project was conducted over two phases. The first phase was called the Implementation Phase, where the students just had to get the library working. The second phase called the Acceptance Phase consisted of running the library against an Acceptance test that checked for quality and reliability (long term stability). The program was built in the Java programming language, and JUnit was used as the testing framework.

Three measures were used to compare the control group and the experimental group. These were programming efficiency, reliability of code, and program understanding. Many different results were analyzed. The TDD group had less errors with respect to code-reuse with statistical significance of P=0.09. They also had less reliable code with statistical significance of P=0.03. Other correlations were observed but none of them were statistically significant. These were that the TDD group spent less time in the Implementation Phase, had less errors when reusing methods more than once, and slightly better reliability in the final implementation of the library.

Muller and Hagner concluded that Test-First development does not accelerate implementation and results are not more reliable, but TDD seems to support better program understanding as measured by successful code-reuse. However, there is one big catch to this paper. Since the testing group was so small, there was only a 64.5% chance that there would be any noticeable differences between the groups. To see a greater difference and more statistically significant conclusions, this experiment must be conducted with larger group sizes.