On the criteria for prioritizing code anomalies to identify architectural problems

Author(s): Santiago Vidal, Everton Guimaraes, Willian Oizumi, Alessandro Garcia, Andrés Díaz Pace, Claudia Marcos
Venue: Automated Software Engineering
Date: 2016-04-04

Type of Experiement: Case Study
Sample Size: 2
Class/Experience Level: Professional
Data Collection Method: Observation, Survey


Presentation failures in web application can have a negative impact on the usability and user experience of the application. Therefore, its is important to find and fix these presentation failures but it is difficult to find the html elements responsible for these presentation failures. This paper presents an innovative automated approach to detecting and locating presentation failures in HTML.

Although testing tools such as Selenium, Sikuli, and Cucumber can be used to find presentation failures, these tools require testers to specify all correctness properties to be checked and they do not handle checking for correctness in an automated approach. Other techniques such as cross-browser testing can also identify presentation failures but the a bug-free version is required to compare against the faulty version containing presentation failures.

The authors of this paper generated a new approach for detecting and identifying the perpetrators of presentation failures. Their approach applies image processing techniques to analyze the visual presentation of web page and then determine the HTML elements responsible for the presentation failures. This approach takes 2 inputs: the first input is the URL of the network or file system in where all the HTML, CSS, JavaScript, and media files of the webpage can be located, the second input is an image that is either a mockup of the webpage or a screenshot of the previously correct version. Afterwards, the visual differences between the web page and the visual oracle (image/screenshot) are identified. The comparison of the two entities is done by iterating over each pixel of the oracle and comparing it against the corresponding pixel in the test webpage. After analyzing for differences using pixel level comparison, a set of pixels representing the difference between the two is generated. Pixels are determined to be equivalent if they have the same color and saturation levels. If the pixels are not equivalent, then the x and y coordinates of the pixel are added to the difference set. This process it what identifies presentation failures.

The following process is what identifies the HTML element or elements responsible for the presentation failures. The bounding rectangles for each HTML element in the web page are extracted. Afterwards, a R-tree describing the pixel level relationships among the HTML elements is generated. For each pixel found in the difference set, a R-tree is used to identify the set of HTML elements whose visual representation includes that pixel. “The union of all of these HTML elements for all difference pixels is the set of potentially faulty HTML elements… In the R-tree built, the leaves of an R-tree correspond to rectangles and non-leaf nodes correspond to the tuple , where I is the identifier for the minimum bounding rectangle that groups nearby rectangles, and child pointer is the pointer to a lower node in the R-tree.”

To test out the authors’ approach the authors generated copies of HTML pages of 4 popular web applications: Gmail(http://www.gmail.com), Craigslist Autos(http://losangeles.craigslist.org/i/autos), Virgin America(http://www.virginamerica.com), and Paypal(http://www.paypal.com). The copies were then seeded with faults to change the visual appearances and they were tested with the authors’ approach. The authors discovered that by using their approach on their 4 test subjects, they were able to detect all presentation failures at 100% of all their test cases and they were able to identify the html elements responsible for these failures at 77% of all their test case.