UML class diagram syntax: an empirical study of comprehension

Author(s): Helen C. Purchase, Linda Colpoys, Matthew McGill, David Carrington, and Carol Britton
Venue: ACM International Conference Proceeding Series; Vol. 16, Proceedings of the 2001 Asia-Pacific symposium on Information visualisation - Volume 9
Date: 2001


This paper focused on determining which UML notations were more suitable with respect
to human performance. This study looked at class diagrams. They looked at 5 different UML
notations, each of which had two variations with identical semantics. Participants were
given a textual specification for the code 15 minutes prior to the experiment. They were
then presented with sets of UML diagrams on a computer in random order. They had to enter
y or n, indicating whether the diagram matched the textual specification. They had 40
seconds in which to answer or their answer was recorded as an error. The diagrams
were a mixture of the various 5 UML notations. The accuracy and speed of participants in
correctly understanding the diagrams were thus measured.

There were two groups of people tested. The first was a set of second and third year
Computer Science and Information Systems students at the University of Queensland who had
little or no experience with UML. They were given a tutorial on UML class diagrams along
with the program specification 15 minutes prior to the experiment. The second group of
participants was a group of 5 expert UML users who were employees at the Distributed Systems
Technology Centre, Brisbane. The 5 experts were volunteers and the 34 students were paid
$15 dollars but presumably volunteered for the experiment (it's not explicitly stated
how the students were obtained).

Of the two variants of each UML notation (labeled A and B), the experimenters chose the A
versions of the notations such that they were the ones that they thought were better
in terms of clarity and conciseness. The 5 varying notations weren't really compared with
one another. Instead, the focus appears to be on comparing the A and B varieties of each
notation. Personally, I think that this made it so that they were doing too much in
one experiment. They were really doing 10 experiments in 1. They were doing 5 experiments
comparing each A notation with its B variant and they those 5 experiments twice -- once
with students and once with professionals.

Unsurprisingly, in general the A notations resulted in faster response times since they
were supposed to be the clearer variants. However, the B notations generally resulted
in higher accuracy. The people doing the study surmise that this is because the diagrams
are less obvious and thus must be studied more closely. In addition, they asked the
experts whether they preferred the A notations or the B notations better and in almost
all cases they chose the A notations.

The part of this experiment that would yield the best information would require a much
more complete paper with descriptions and examples of all of the notations and their
variants -- that is what aspects of the A notations make them better than the B notations?
Some of details about that are given, but not many. Having the diagrams read top down
with base classes above derived classes was perceived as better. Also, having join inheritance
lines instead of one for each derived class was perceived as better since it reduced the
number of lines on the page and reduced clutter on large diagrams.

All in all, the results of this study could be useful in better determining which
UML notations to use, but without detailed information on those notations, this study
is not of much help.