Solving the Search for Source Code

Author(s): Kathryn T. Stolee, Sebastian Elbaum
Venue: ACM Transactions on Software Engineering and Methodology
Date: May 2014

Type of Experiement: Survey/Multi-Case Study
Sample Size: 99
Class/Experience Level: Professional
Data Collection Method: Observation, Survey


In this paper the authors take the common problem of programmers endless search for sample code, example code, and reusable code. Programmers search for code using a search engine and then sift through sites like StackOverflow looking for code to use, but the inherent limitations of search for code using normal language is that these programmers usually need to manually sort through the results looking for the code behavior they desire. The problem is the dissonance between the search terms themselves and the actual desired behavior of the code.

Therefore the authors suggest using a novel new way of searching for code; providing the input for code, and then providing the expected output of the code. This way they can ensure that the results of the code will be based on what the code actually does, rather than the semantics of the language used in the search. By using the concepts of programming by demonstration and program synthesis, they attempt to find existing code that does the required transformation of data and create program generation guided by a constraint solver.

The main purpose of the constraint solver is to take the input and output parameter provided by the users and create a set of constraints that test sample code behavior in order to ensure that accurate, or close enough, solutions are provided by the search.

The study took into account the characterization of how developers search to find code based on a survey of 99 participants, as well as evidence that programmers use examples to explain their problems (drawn from 300 StackOverflow questions). The study was tested on three specific domains: the Java String library, SQL queries, and the Yahoo! Pipes domain.

By the end of the study, the authors found that their approach was effective at finding relevant code, can be used on its own or to filter results from keyword searches to increase search precision, and is adaptable to find approximate matches which guide modifications to match the user specifications even though exact matches may not already exist. These gains in precision and flexibility found in searching this way come at the cost of performance and the speed at which these searches can be conducted.