Self-Adapting Reliability in Distributed Software Systems

Author(s): Yuriy Brun, Jae young Bang, George Edwards, Nenad Medvidovic
Venue: IEEE Transactions on Software Engineering
Date: 2015

Type of Experiement: Other
Data Collection Method: Project Artifact(s)


Software systems that are distributed across many computers, also known as distributed systems, are particularly vulnerable because the software has little control over the environment it operates in.

This paper presents a self-adaptive method for improving software reliability when deploying to systems that may have unpredictable resources or may face attackers. The issue with such systems is that these systems may fail or become compromised. The method presented is called iterative redundancy which detects when resource reliability drops, identifies parts of the computations that deploy on compromised systems, and does not rely upon system reliability. Iterative redundancy also guarantees to use a given set of resources optimally, which means that it requires the least resources possible.

Iterative redundancy was evaluated using XDEVS and BOINC, two existing distributed systems. In addition, iterative redundancy was proven to be more effective on a theoretical level against traditional redundancy and progressive redundancy.

Iterative redundancy demonstrated a lower cost during times when it is not needed and a higher level of system reliability during decreases in node reliability. Iterative redundancy uses node failure predictions to be able to better allocate resources to improve system reliability when node failures actually occur. Overall, system reliability is increased in both the XDEVS and the BOINC deployments using iterative redundancy.