Advancing Candidate Link Generation for Requirements Tracing: the Study of Methods

Author(s): Hayes, J., Dekhtyar, A., Sundaram, S.
Venue: IEEE Transactions on Software Engineering
Date: 2006

Type of Experiement: Controlled Experiment
Data Collection Method: Project Artifact(s)



This experiment studies the effects that filtering and analyst feedback have on methods of automated candidate link generation for requirements tracing. When using an automated tool to trace requirements, a list of possible links is shown to an analyst, who must discern true links from false ones. Filtering means that the automated tool will only show the candidate links above a specific similarity measure. Analyst feedback is the act of using the analyst's answers as feedback to the algorithms to generate better lists of candidate links.

The authors apply filters at similarity measures of 0, 0.05, 0.10, 0.15, and 0.20 to four information retrieval algorithms: term frequency-inverse document frequency (TF-IDF) and latent semantic indexing (LSI), both with and without a thesaurus. They also simulate analyst feedback of the top n candidate links per iteration, where n is 1, 2, 3, or 4. Top n analyst feedback means that the simulated analyst provides an answer for the n number of unevaluated candidate links with the highest similarity scores for each iteration.

The results show that feedback of the top 2 candidate links is the best balance of accuracy and analyst effort. The results for top 1 were significantly worse, and the results for top 3 and 4 were only marginally better than top 2. Increasing the threshold for filtering tends to improve precision at a small cost of recall. The best results were from the TF-IDF+Thesaurus method with filter levels of 0.10 and 0.15, which reached over 90% recall and 70% precision after 8 iterations.