QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, cilt.25, sa.1, ss.41-68, 2009 (SCI-Expanded)
Real-time computer systems deployed in life-critical control applications must be designed to meet stringent reliability specfications. The minimum acceptable degree of reliability for systems of this type is '7 nines', which is not generally achieved. This paper aims at contributing to the achievement of that degree of reliability. To this end, this paper proposes a classification scheme of the fault-tolerant procedures for redundant computer systems (RCSs). The proposed classification scheme is developed oil the basis of the number of counteracted fault types. Table I is created to relate the characteristics of the RCSs to the characteristics of the fault-tolerant procedures. A selection algorithm is proposed, which allows designers to select the optimal type of fault-tolerant procedures according to the system characteristics and capabilities. The fault-tolerant procedure, which is selected by this algorithm, provides the required degree of reliability for a given RCS. According to the proposed graphical model only a part of the fault-tolerant procedure is executed depending oil the absence or presence (type and sort) of faults. The proposed methods allow designers to counteract Byzantine and non-Byzantine fault types during degradation of RCSs from N to 3, and only the non-Byzantine fault type during degradation from 3 to 1 with optimal checkpoint time period. Copyright (C) 2008 John Wiley & Sons, Ltd.