: Helen C. Purchase, Matthew McGill, Linda Colpoys, and David CarringtonVenue
: ACM International Conference Proceeding Series, Proceedings of the 2001 Asia-Pacific symposium on Information visualisationDate
This paper attempts to gather data on how aesthetics affect the human understandability
of UML diagrams. They did this in two experiments. The first use computational metrics
and the diagrams were laid out according to those metrics. The second used a separate
human perception study on how to assess the extent to which aesthetics were perceived in
a diagram in order to determine the layouts of the UML diagrams used in the study.
In the first experiment, they looked at 5 different aesthetics to vary: number of bends, node
distribution, how much edge lengths vary, direction of flow, and orthogonality. The second
experiment examined those 5 plus edge length (whether shorter or longer is better), and
symmetry within the diagram.
In the first experiment, participants were given a textual specification for the code 15 minutes
prior to the experiment. They were then presented with sets of UML diagrams on a computer
in random order. They had to enter y or n, indicating whether the diagram matched the
textual specification. They had 50 seconds in which to answer or their answer was recorded
as an error. The diagrams were a mixture of a control diagram, diagrams that varied
one aesthetic by increasing it or decreasing, and diagrams that did not vary the
aesthetics (like the control diagram) but were incorrect. The speed of participants in
correctly understanding the diagrams were thus measured.
30 students were involved in the first experiment. They were second and third year
Computer Science and Information Systems students at the University of Queensland who had
little or no experience with UML. They were given a tutorial on UML collaboration diagrams along
with the program specification 15 minutes prior to the experiment. The students were paid
$15 dollars but presumably volunteered for the experiment (it's not explicitly stated
how the students were obtained).
The results of the first experiment were a combination of inconclusive and results that
contradicted their expected results (based on previous studies). They expected that a low
number of bends would be better, but the results showed that more bends was better. They
thought that having a very low level of edge variation (the edges are all about the same
length) would be best but found that the control diagram -- which had a medium level
of variation -- did the best. Also, previous experiments (such as "UML class diagram syntax:
an empirical study of comprehension") showed that having class diagrams flow such that base
classes were on the top with their subclasses below them resulted in the best performance but
the results of experiment 1 of this study showed the opposite.
Their results were contradictory enough that they decided that the computational metrics
that they used to produce the diagrams were probably poor and they decided to do a second
experiment where the diagram layouts were based human perception of those aesthetics. Otherwise,
the second experiment was almost identical to the first. This time there were 35 students
and they were given only 40 seconds per diagram (on the theory that because the diagrams
were laid out according human perceptions rather than computational metrics, they would
be easier to read). Also, due to the issues with edge variability, they added edge length
as another aesthetic to measure along with symmetry (which they hadn't used before because
they didn't have a computational metric for it).
This time around, the results were much closer to their expectations but they still had
problems. Fewer bends was better but now both varying the edges a lot and a little performed
better than the control diagram none of the other results produced any significant data.
All in all, while the experiment had some potential, it really didn't give us much in the
way of meaningful results. Perhaps this means that the aesthetics of UML diagrams aren't
all that important in understanding them -- at least in the areas that they measured
in this study.