An Exploratory Study of the Pull-Based Software Development Model

Author(s): Georgios Gousious, Martin Pinzger, Arie van Deursen
Venue: ICSE 2014 Proceedings of the 36th International Conference on Software Engineering
Date: 2014

Type of Experiement: Case Study
Sample Size: 291
Class/Experience Level: Professional
Participant Selection: GHTorrent corpus as well as 291 selected projects


This study analyzes pull-based software development and the use of pull requests to merge code as opposed to pushing changes to a central repository. The factors that affect the usage of pull requests as well as the effectiveness and efficiency of pull request handling is of specific interest. The project analyzed open-source projects in the GHTorrent corpus on GitHub. They collected metrics on each project, storing various information in a database, and used six machine-learning algorithms to find patterns within the data.

Some information that they found was that a mere 14% of active projects used pull-requests as a form of collaboration. About 80% of the pull requests received less than 4 comments, and 95% received less than 12 during the discussion, prior to merging or closing. However, in their dataset 80% of pull requests were merged into the repository within 4 days and 60% within one day. Whether or not the pull request is merged depends mostly due on whether or not it modifies recently modified code, but the time to merge depends on other factors as well, such as the developer’s track record. Only 13% of pull requests are rejected for technical reasons.

The project also had some interesting takeaways. They found that including a high coverage test suite can be just as important as providing clear guidelines for attracting contributors and merging quickly. They also found that insufficient task articulation was one of the largest causes of wasted or non-merged work.