Recovery Device for Real-Time Dual-Redundant Computer Systems


SAMET R.

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, cilt.8, sa.3, ss.391-403, 2011 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 8 Sayı: 3
  • Basım Tarihi: 2011
  • Doi Numarası: 10.1109/tdsc.2010.12
  • Dergi Adı: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.391-403
  • Anahtar Kelimeler: Dual-redundant computer system, fault-tolerant procedure, hardware implementation, real-time, recovery device, recovery point, temporary and permanent faults, FAULT-TOLERANT COMPUTER, DESIGN
  • Ankara Üniversitesi Adresli: Evet

Özet

This paper proposes the design of specialized hardware, called Recovery Device, for a dual-redundant computer system that operates in real-time. Recovery Device executes all fault-tolerant services including fault detection, fault type determination, fault localization, recovery of system after temporary (transient) fault, and reconfiguration of system after permanent fault. The paper also proposes the algorithms for determination of fault type (whether the fault is temporary or permanent) and localization of faulty computer without using self-testing techniques and diagnosis routines. Determination of fault type allows us to eliminate only the computer with a permanent fault. In other words, the determination of fault type prevents the elimination of nonfaulty computer because of short temporary fault. On the other hand, localization of faulty computer without using self-testing techniques and diagnosis routines shortens the recovery point time period and reduces the probability that a fault will occur during the execution of fault-tolerant procedure. This is very important for real-time fault-tolerant systems. These contributions bring both an increase in system performance and an increase in the degree of system reliability.