A Method for Recommending Computer-Security Training for Software Developers

Author(s): Muhammad Nadeem, Edward B. Allen, Byron J. Williams
Venue: International Conference on Information Technology
Date: 2015

Type of Experiement: Case Study
Sample Size: 3000
Class/Experience Level: Undergraduate Student
Participant Selection: PhD student classwork
Data Collection Method: Code Metric


This paper seeks to provide a proof of concept for recommending tailored computer security training courses to individual software engineers by leveraging the power of static analysis techniques and mapping the generated security code smells and vulnerabilities to relevant information in computer security and vulnerability knowledge-bases.

The prototype uses the static analysis tool called FindBug to identify code smells and vulnerabilities in a given repository (the study examined the source code for the open source health care system, Toven v2.0) Smells and vulnerabilities related to cyber security were cataloged. Following the collection of relevant vulnerabilities and smells, the system uses NLP technique called term-frequency-inverse-document-frequency analysis in order to compute a similarity score between the description for a given code smell/vulnerability generated by FindBug and the articles in the security and vulnerability knowledge base (Common Weakness Enumeration - CWE). High similarity scores indicate there is likely useful and relevant information in the knowledge base which can help to train the software engineer to resolve the existing vulnerability and be trained on these vulnerabilities in the process.

The results of the case study indicate that 85% of vulnerability description files were mapped to related CWE knowledge-base articles. The authors were pleased with the results of their prototype system, however, they discuss 2 threats to validity which are the topics of their future word: false positives and false negatives produced by the Static Analysis tools which could refer engineers to material which is not tailored to the true educational needs of the engineers. Besides mitigation false positives and negatives, other future work for the system includes exploring methods of replacing/correcting vulnerabilities in the source files based on findings of the static analysis tools. However, the accuracy of static analysis must be very consistent and strong before there is any useful aspects to automating code replacement and fixing.