Rapid recovery for systems with scarce faults

Chung Hao Huang, Doron Peled, Sven Schewe, Farn Wang

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.

Original languageEnglish
Pages (from-to)15-28
Number of pages14
JournalElectronic Proceedings in Theoretical Computer Science, EPTCS
Volume96
DOIs
StatePublished - 7 Oct 2012
Event3rd International Symposium on Games, Automata, Logics and Formal Verification, G and ALF 2012 - Napoli, Italy
Duration: 6 Sep 20128 Sep 2012

Bibliographical note

Publisher Copyright:
© Chung-Hao Huang, Doron Peled, Sven Schewe, and Farn Wang.

Funding

The research was supported by the National Science Council (NSF) 97-2221-E-002-129-MY3, by the Israeli Science Foundation (ISF) grant 1252/09, and by the Engineering and Physical Sciences Research Council (EPSRC) grant EP/H046623/1.

FundersFunder number
Israeli Science Foundation1252/09
National Sleep Foundation97-2221-E-002-129-MY3
Engineering and Physical Sciences Research CouncilEP/H046623/1
National Science Council

    Fingerprint

    Dive into the research topics of 'Rapid recovery for systems with scarce faults'. Together they form a unique fingerprint.

    Cite this