Fault-local distributed mending

Shay Kutten, David Peleg

Research output: Contribution to conferencePaperpeer-review

47 Scopus citations

Abstract

As communication networks grow, existing fault handling tools that involve global measures such as global time-outs or reset procedures become increasingly unaffordable, since their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, its cost must depend only on the number of failed nodes (which, thanks to today's technology, grows much slower than the networks). Moreover, it should allow the non-faulty regions of the networks to continue their operation even during the recovery of the faulty parts. This abstract introduces the concepts fault locality, and of fault-locally mendable problems, which are problems for which there exist correction algorithms (applied after faults) whose cost depends only on the (unknown) number of faults. We show that any problem is fault locally mendable. The solution involves a novel technique combining data structures and 'local votes' among nodes, that may be of interest in itself.

Original languageEnglish
Pages20-27
Number of pages8
StatePublished - 1995
Externally publishedYes
EventProceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing - Ottawa, Can
Duration: 20 Aug 199523 Aug 1995

Conference

ConferenceProceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing
CityOttawa, Can
Period20/08/9523/08/95

Bibliographical note

Funding Information:
As communication networks grow, existing fault handling tools that involve global measures such as global time-outs or reset procedures become increasingly unaffordable, since their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, its cost must depend only on the number of failed nodes (which, thanks to today's technology, grows much more slowly than the networks). Moreover, it should allow the nonfaulty regions of the networks to continue their operation even during the recovery of the faulty parts. This paper introduces the concepts fault locality and fault-locally mendable problems, which are problems for which there are correction algorithms (applied after faults) whose cost depends only on the (unknown) number of faults. We show that any input-output problem is fault-locally mendable. The solution involves a novel technique combining data structures and ``local votes'' among nodes, which may be of interest in itself. Q 1999 Academic Press * Alexander Goldberg lecturer. ² Supported in part by a Walter and Elise Haas Career Development Award and by a grant from the Israel Science Foundation. Part of the work was done while visiting the IBM T. J. Watson Research Center.

Funding

As communication networks grow, existing fault handling tools that involve global measures such as global time-outs or reset procedures become increasingly unaffordable, since their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, its cost must depend only on the number of failed nodes (which, thanks to today's technology, grows much more slowly than the networks). Moreover, it should allow the nonfaulty regions of the networks to continue their operation even during the recovery of the faulty parts. This paper introduces the concepts fault locality and fault-locally mendable problems, which are problems for which there are correction algorithms (applied after faults) whose cost depends only on the (unknown) number of faults. We show that any input-output problem is fault-locally mendable. The solution involves a novel technique combining data structures and ``local votes'' among nodes, which may be of interest in itself. Q 1999 Academic Press * Alexander Goldberg lecturer. ² Supported in part by a Walter and Elise Haas Career Development Award and by a grant from the Israel Science Foundation. Part of the work was done while visiting the IBM T. J. Watson Research Center.

FundersFunder number
Israel Science Foundation

    Fingerprint

    Dive into the research topics of 'Fault-local distributed mending'. Together they form a unique fingerprint.

    Cite this