## Abstract

This paper lays a theoretical foundation for scaling fault tolerant tasks to large and diversified networks such as the Internet. In such networks, there are always parts of the network that fail. On the other hand, various subtasks interest only parts of the network, and it is desirable that those parts, if nonfaulty, do not suffer from faults in other parts. Our approach is to refine the previously suggested notion of fault local algorithms (that was best suited for global tasks) for which the complexity of recovering was proportional to the number of faults. We refine this notion by introducing the concept of tight fault locality to deal with problems whose complexity (in the absence of faults) is sublinear in the size of the network. For a problem whose time complexity on an n-node network is T(n) (where possibly T(n) = o(n)), a tightly fault local algorithm recovers a legal global state in O(T(x)) time when the (unknown) number of faults is x. This concept is illustrated by presenting a general transformation for maximal independent set (MIS) algorithms to make them tightly fault local. In particular, our transformation yields an O(log x) randomized mending algorithm and an exp(O(√log x)) deterministic mending algorithm for MIS. The methods used in the transformation may be of interest by themselves.

Original language | English |
---|---|

Pages (from-to) | 247-268 |

Number of pages | 22 |

Journal | SIAM Journal on Computing |

Volume | 30 |

Issue number | 1 |

DOIs | |

State | Published - 2000 |

Externally published | Yes |

## Keywords

- Distributed algorithms
- Fault tolerance
- Recovery
- Stabilization