Improving Requirements Tracing via Information Retrieval

Author(s): Hayes, J., Dekhtyar, A., Osborne, J.
Venue: IEEE International Requirements Engineering Conference
Date: 2003

Type of Experiement: Controlled Experiment
Sample Size: 2



In this paper, the authors explore the possibility to automate the generation of candidate links for requirements tracing. They applied three information retrieval (IR) algorithms to datasets from an open source NASA project (MODIS). The datasets contained 19 high-level requirements and 50 low-level requirements. The algorithms applied are term frequency-inverse document frequency (tf-idf) vector retrieval, tf-idf with key-phrases, and tf-idf with a thesaurus. They compare the results of the IR algorithms to the results of human analysts performing the work manually and with an existing tool.

The tf-idf algorithm without key-phrase or thesaurus retrieval achieved 23.0% recall and 17.6% precisions on the 10x10 dataset. The analysts nearly matched or outperformed the IR algorithm for recall and precision. It took the analysts 65 and 150 minutes, whereas it took the IR algorithm less than a minute to complete the same task. The key-phase enhancement raised recall (27.2%) at the cost of precision (5.2%) on the same dataset. The thesaurus algorithm was tested on the large dataset (19x50) and achieved much better recall and precision measures (85.4% and 40.6% respectively). Lastly, the thesaurus retrieval algorithm was tested against a senior analyst with an existing tracing tool (SuperTracePlus). The thesaurus algorithm achieves higher overall recall and precision than SuperTracePlus.