Design and implementation of highly reliable dual-computer systems


SAMET R.

COMPUTERS & SECURITY, cilt.28, sa.7, ss.710-722, 2009 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 28 Sayı: 7
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1016/j.cose.2009.04.003
  • Dergi Adı: COMPUTERS & SECURITY
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.710-722
  • Anahtar Kelimeler: Reliability, Performance, Real-time, Dual-computer systems, Fault tolerance, Recovery point, Satisfiability model, Recovery device, FAULT-TOLERANT COMPUTER, REDUNDANCY
  • Ankara Üniversitesi Adresli: Evet

Özet

Two of the main parameters of real-time computer systems are reliability and performance. Researchers are always looking for solutions to increase the values of these parameters, which is the goal of this study. To this end, we propose an architecture for a dual-computer system that operates in real-time with fault tolerance implemented purely by hardware. The hardware, as designed and implemented, performs the following key services: 1) determination of the fault type (temporary or permanent) and 2) localization of the faulty computer without using self-testing techniques or diagnostic routines. Our design has several benefits: 1) the designed hardware shortens the recovery point time period; 2) the proposed nontrivial sequence of fault-tolerant services reduces (to two) the number of logical segments that must be re-run to recover computational processes; and 3) the determination of the fault type allows for the elimination of only computers with permanent faults. These contributions yield improvements in both the performance and reliability of the system. (C) 2009 Elsevier Ltd. All rights reserved.