Hardware Implementation of Fault-Tolerance in Dual Computer Systems


SAMET R.

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, cilt.25, sa.8, ss.1015-1028, 2009 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 25 Sayı: 8
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1002/qre.1018
  • Dergi Adı: QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.1015-1028
  • Anahtar Kelimeler: real-time, dual computer systems, faults, fault-tolerance, recovery point, hardware implementation, recovery device, REDUNDANCY, DESIGN
  • Ankara Üniversitesi Adresli: Evet

Özet

In this paper, we propose air architectural design for a dual computer system (DCS) that operates in real-time with the fault-tolerance implemented purely by hardware. We have a novel design allowing the implementation of hardware that performs the following key services: the determination of fault type (temporary or permanent) and the localization of the faulty computer without using self-testing techniques and diagnosis routines. We also propose a non-trivial sequence of services for fault-tolerance in which the determination of the fault type and the recovery of computational processes after a temporary fault are realized before fault localization. Our design has several ben(fits: the designed hardware shortens the recovery point time period; the proposed non-trivial sequence of fault-tolerant services reduces (to two) the number of logical segments that should be re-run to recover the computational processes; and the determination of the fault type allows eliminating only the computer with a permanent fault. These contributions bring both an increase in system performance and art increase in the degree of system reliability. Copyright (C) 2009 John Wiley & Sons, Ltd.