Text Mining for Software Engineering: How Analyst Feedback Impacts Final Results

Author(s): Hayes, J., Dekhtyar, A., Sundaram, S.
Venue: MSR '05
Date: 2005

Type of Experiement: Controlled Experiment
Sample Size: 3
Class/Experience Level: Professional
Data Collection Method: Project Artifact(s)


Link: http://portal.acm.org/citation.cfm?id=1083142.1083153

This paper describes a pilot study examining the impact of human analysts on traceability results. For the study, three senior analysts were each given a list of candidate traces with different levels of recall and precision and asked to correct them. The analysts were given a week to perform the task but not given any time constraints. At the end of the week, they were asked to return their results, a brief description of their process, and a brief activity log.

Although the sample size is too small to draw conclusions, the study discovers some patterns. Analysts who were given a candidate trace with lower recall took 25-50% longer to complete the task. Each analyst tended to finish with results close to the recall=precision line. Analysts whose traceset began above the line tended to reduce recall, whereas analysts whose traceset began below the line tended to reduce precision. It can't be said whether that pattern is generalizable, because only three analysts were involved in the study. Also, the three tracesets given to the analysts had (a) high recall, low precision; (b) medium recall, medium precision; and (c) low recall, high precision. It could be that the analyst who lowered precision did so because he had to search for more links given his low initial recall.