An empirical evaluation of two user interfaces of an interactive program verifier

Author(s): Martin Hentschel, Reiner Hähnle, Richard Bubel
Venue: Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on
Date: 06 October 2016

Type of Experiement: Controlled Experiment
Sample Size: 21
Class/Experience Level: Undergraduate Student, Graduate Student, Professional
Data Collection Method: Observation


This study evaluation aims to empirically compare the effectiveness and efficiency of different user interfaces of an interactive theorem prover. The interfaces allow the user to inspect a proof attempt to verify a Java program compiles with its specifications. The interfaces are based on the KeY verification system. The KeY verification system is based on a deductive formulation of symbolic execution in a program logic, where programs are annotated with loop invariants and method contracts.

The participants with KeY experience spent less time when using KeY than when using the SED. Analysis of the experiment concludes that participants perform significantly better in finding an actual reason for a failed proof attempt when using SED. Usability of SED can be improved by better highlighting unverified nodes. When using SED, an applied contract was identified more often when the participants were asked which contracts were applied during proof search. The time spent using SED was almost always higher than using KeY. One explanation for this is the users were more experienced with KeY and therefore less time was spent learning it. Almost all participants considered the proof tree and the possibility to filter out of intermediate proof steps as mostly helpful when using KeY. Concluding, participants without at least 2 years of experience preferred SED over KeY and experienced KeY users preferred KeY.