Bugram: Bug detection with n-gram language models

Author(s): Song Wang, Devin Chollak, Dana Movshovitz-Attias, Lin Tan
Venue: Automated Software Engineering (ASE), 2016 31st IEEE/ACM International Conference on
Date: 2016

Type of Experiement: Quasi-Controlled Experiment
Sample Size: 15
Participant Selection: Public java repositories
Data Collection Method: Code Metric


This study was focused on a new n-gram method of automated bug detection. N-gram modelling is commonly used to analyze language by n-length groups of words(or tokens) based on their probability. When applied to bug detection it has previously been used to generate rules. So if a program contains a lot of of the tokens "ABC" that method would generate a rule that if "AB" then "C". So if "ABD" comes up it is detected as a bug. However this approach cannot handle rarer cases when something like "EFD" comes up only a couple of times.
Bugram takes a different approach by simply using the probabilities instead of the extra step to rules. So "EFD" would be detected as rare and brought up. This would create issues with too many false positives so they refined it by first stripping out any tokens (not combinations) that did not appear often enough (3 or less) because enough data couldn't be gathered. Then it would calculate the probabilites of each permutation of n-grams for different n's. And refine it even more by only reporting low probabilities that showed up for more than 1 n-size. They tested this by running it on 15 open-source projects and comparing the bug detection results with other programs. This technique discovered 23 more true bugs in total, 2 in common, and 7 that it missed. It also had better precision than the rule-based approach they were comparing with. So they concluding this technique complements rule-based techniques because it could precisely find different bugs that were missed by the previous technique.