Fault-Tolerant Procedures for Redundant Computer Systems


SAMET R.

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, cilt.25, sa.1, ss.41-68, 2009 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 25 Sayı: 1
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1002/qre.949
  • Dergi Adı: QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.41-68
  • Anahtar Kelimeler: redundant computer system, fault-tolerant procedure, non-Byzantine and Byzantine fault types, classifications, selection algorithm, graphical model, REAL-TIME SYSTEMS, ERROR-DETECTION, ALGORITHM, ARCHITECTURE, PROTOCOLS, RECOVERY, DESIGN
  • Ankara Üniversitesi Adresli: Evet

Özet

Real-time computer systems deployed in life-critical control applications must be designed to meet stringent reliability specfications. The minimum acceptable degree of reliability for systems of this type is '7 nines', which is not generally achieved. This paper aims at contributing to the achievement of that degree of reliability. To this end, this paper proposes a classification scheme of the fault-tolerant procedures for redundant computer systems (RCSs). The proposed classification scheme is developed oil the basis of the number of counteracted fault types. Table I is created to relate the characteristics of the RCSs to the characteristics of the fault-tolerant procedures. A selection algorithm is proposed, which allows designers to select the optimal type of fault-tolerant procedures according to the system characteristics and capabilities. The fault-tolerant procedure, which is selected by this algorithm, provides the required degree of reliability for a given RCS. According to the proposed graphical model only a part of the fault-tolerant procedure is executed depending oil the absence or presence (type and sort) of faults. The proposed methods allow designers to counteract Byzantine and non-Byzantine fault types during degradation of RCSs from N to 3, and only the non-Byzantine fault type during degradation from 3 to 1 with optimal checkpoint time period. Copyright (C) 2008 John Wiley & Sons, Ltd.