An empirical study on real bug fixes

Author(s): Hao Zhong, Zhendong Su
Venue: Proceedings of the 37th International Conference on Software Engineering
Date: 2015-05-16

Type of Experiement: Controlled Experiment
Sample Size: 9000
Class/Experience Level: Other
Participant Selection: Authors selected 6 popular Java programs and analyzed their logs
Data Collection Method: Observation, Code Metric

Quality
3

Fixing bugs is often time consuming and very expensive for companies. One technique that has moved forward in recent years, is automatic program repair in which different templates for bug fixes have been documented and used to automatically search a program for bugs and fix them automatically. Of course, these automatic program repairers have been met with skepticism, because people doubt that they could ever fix actual real bugs.

The technology to analyze the effectiveness of these tools is still premature and would produce results that are trivial or possibly incorrect, so in this empirical study, Hao Zhong and Zhendong Su developed a tool called BUGSTAT that could analyze real world bug fixes make by humans.

They use BUGSTAT to study 6 popular Java programs with over 9000 real world bug fixes. They analyze commit logs and split bugs into two categories: reported bugs and on-demand bugs. Reported bugs are bugs that have been reported into tools like JIRA with issue numbers and can be easily examined whereas the later on-demand bugs are usually bugs that have been deemed by the programmer as trivial so they are not reported in these tools, only in commit logs.

Using this data, they found 15 findings that they hope will guide research in the direction of automatic program repair.

Finding 1. In total, programmers did not modify any source files to fix about 10% of reported bugs and about 20% of on-demand bugs.

Finding 2. In total, programmers modified one or more source files to fix about 90% of reported bugs and more than 70% of on-demand bugs.Finding 2. In total, programmers modified one or more source files to fix about 90% of reported bugs and more than 70% of on-demand bugs.

Finding 3. In total, programmers made a repair action to fix less than 30% of source files.

Finding 4. In total, programmers made at least two repair actions to fix more than 70% of source files.

Finding 5. Programmers made multiple non-data dependent repair actions to fix about 40% of source files (the C2 category).

Finding 6. In total, as shown in Figures 2 and 3, programmers made data dependent repair actions to fix more than 40% of source files.

Finding 8. The repair actions on a code element typically increase with its complexity.

Finding 9. The actions on code elements follow two patterns. First, the modifications on a code element increase with its complexity. Second, additions on a code element are more than deletions.

Finding 10. In total, programmers did not make any API repair actions to fix half of the source files

Finding 11. In total, programmers made at least one API repair action to fix the other half of source files.

Finding 12. The most common modified files are Java source files, and the other files are much fewer.
Finding 13. The two most common modified non-source files are configuration files and natural language documents.

Finding 14. Some modified source files are in programming languages other than Java for two reasons. First, a project may be implemented in multiple programming languages. Second, a project may implement an interface for a programming language.

Finding 15. In total, programmers did not add any files to fix more than 80% of the bugs, and they did not delete any files to fix more than 90% of the bugs.

0