Attribution Required: Stack Overflow Code Snippets in GitHub Projects

Author(s): Sebastian Baltes, Richard Kiefer, Stephan Diehl
Venue: International Conference on Software Engineering
Date: 2017

Type of Experiement: Survey/Multi-Case Study
Sample Size: 122
Class/Experience Level: Graduate Student
Participant Selection: Responded to survey
Data Collection Method: Survey, Code Metric


Stack Overflow is the widest used Q&A website for developers, providing a huge amount of code snippets that are often copied in projects without proper attribution. Many developers do not know that using these snippets raises various maintenance and legal issues. The SO license requires attribution referencing the question or answer, and requires derived work to adopt a similarly compatible license. This study attempts to collect and analyze results of SO code snippets copied and pasted into popular Github projects, and survey developers on their own usage of SO. To do so they used a token based clone detector, the PMD Copy-Paste Detector, to find unreferenced usages of three different sets of SO code snippets in a random sample of popular GH Java projects.

The study was for the most part successful. They found that only 23% of the identified clones of Java snippets included a reference to SO. They conclude one-third as the upper bound for attributed uses, meaning two thirds of the copied code snippets were not compliant with the SO license. They then collected references to SO URLs in popular GH repositories. 7.33% of them contained a reference to SO. From their preliminary survey, they found that around 50% of them did not attribute SO. They conclude that there could be legal issues down the line for these projects, and the next step would be to automatically find and insert references on projects using a tool.