













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This presentation covers various hardware implementation strategies for fault-tolerant computing, including replication in space and time, self-checking designs, and monitoring methods. The document also discusses the multilevel model of dependable computing and the main drawbacks and advantages of different tolerance/recovery methods.
Typology: Study notes
1 / 21
This page cannot be seen from the preview
Don't miss anything!














Nov. 2006^
Hardware Implementation Strategies
Slide 1
Nov. 2006^
Hardware Implementation Strategies
Slide 2
Released
Revised
Revised
First^
Oct. 2006 This presentation has been prepared for the graduatecourse ECE 257A (Fault-Tolerant Computing) byBehrooz Parhami, Professor of Electrical and ComputerEngineering at University of California, Santa Barbara.The material contained herein can be used freely inclassroom teaching or any other educational setting.Unauthorized uses are prohibited. © Behrooz Parhami
Nov. 2006^
Hardware Implementation Strategies
Slide 4
Nov. 2006^
Hardware Implementation Strategies
Slide 5
Logic
Service^
Result
Information
System
Level^ →
Low-Level Impaired
Mid-Level Impaired
High-Level Impaired
Unimpaired
Entry Legend:^
Deviation
Remedy^
Tolerance
Ideal
Defective
Faulty
Erroneous
Malfunctioning
Degraded
Failed
Nov. 2006^
Hardware Implementation Strategies
Slide 7
Pair-and-spare
VS 1 2 3 4
Switch-voter Spare V 1 2 3 C Voter 1 2^ Comparator
Error
1 Error^ C 2^ Comparators^ C 1 ′ 2 ′^
Switch^ S Error NMR/Hybrid
The following schemes havealready been discussed inconnection with fault tolerance^ Duplicate and compare^ Triplicate and vote
Nov. 2006^
Hardware Implementation Strategies
Slide 8
+^ +^ ×^ Schedulewith 1 adder
Duplicatecomputation+^ ×
Duplicate^ +^ × computation
t^0 t + 1 (^0) t + 2 0 Computation flowgraph,and schedule with 2 adders
Nov. 2006^
Hardware Implementation Strategies
Slide 10
Transmission of data over unreliable wires or buses^ Send data; store at receiving end^ Send bitwise complement of data^ Compare the two versions^ Detects wires s-a-0 or s-a-1, as well as many transients The^ dual
of a Boolean function
f ( x ,^ x ,... ,^12
x^ ) is another function n
f ( x ,^ x ,... ,d^12
x^ ) such that n
f ( x ′,^ x d^1
x^ ′) =^ f^ ′( n
x ,^ x ,... ,^1
x^ ) n
Fact: Obtain the dual of
f^ by exchanging AND and OR operators in its
logical expression. For example, the dual of
f^ =^ ab^ ∨
c^ is^ f = (d^
a^ ∨^ b ) c
f f d InputsCompl.inputs
Output Error
Advantages of thisapproach comparedto duplication includea smaller probabilityof common errors
Nov. 2006^
Hardware Implementation Strategies
Slide 11
f ( x ,^ x ,... ,^12
x^ ) =^ f ( n d
x ,^ x ,... ,^12
x ) n
With a self-dual function
f , the functions
f^ and^ f d
in the diagram above
can be computed by using the same circuit twice (time redundancy)
f f d InputsCompl.inputs
Output Error
For example, both thesum^ a^ ⊕
b^ ⊕^ c^ and carry^ ab
∨^ bc^ ∨^
ca outputs of a full-adderare self-dual functions Many functions of practical interest are self-dual
Use same circuit twice
Examples
(proofs left as exercise) A^ k -bit binary adder, with 2
k^ + 1 inputs and
k^ + 1 outputs, is self-dual
So are 1’s-complement and 2’s-complement versions of such an adder
Nov. 2006^
Hardware Implementation Strategies
Slide 13
k -bit adder twice for error detection or 3 times for error correction, one can segment the operands into 2 or 3 parts andsimilarly segment the adder; perform replicated addition on operandsegments and use comparison/voting to detect/correct error
C FF
c out^ Error Lower halfof adder Upper halfof adder
Comparator
x L x H y L y H
c in Various other segmentationschemes have been suggested Example:
16-bit adder with 4-way segmentation and voting
Sum computed in two cycles:The lower half in cycle 1, and the upper half in cycle 2^ Townsend, Abraham, and Swartzlander, 2003
Nov. 2006^
Hardware Implementation Strategies
Slide 14
Instead of duplicating the computation with no hardware change (slow)or duplicating the entire hardware (costly), we can add some hardware
to make the interleaved recomputations more efficientRecomputation with samehardware resources (
T^ = 5, excluding compare time) +^ +^ × Originalcomputation( T^ = 3)
Recomputationwith the inclusionof an extra adder( T^ = 3, excludingcompare time) +^
Duplicate^ +^ × computation
Consider the effectof including asecond adder
Nov. 2006^
Hardware Implementation Strategies
Slide 16
Watchdog unit monitors events occurring in,and activities performed by, the function unit(e.g., event frequency and relative timing)
Functionunit^
Activitymonitor
Observed behavior is compared against expected behavior The type of monitoring is highly application-dependent
Nov. 2006^
Hardware Implementation Strategies
Slide 17
/ k / k
/ k
Parity-encodedinputs^
ALU
Parity-encodedoutput^ Errorsignal Paritygenerator Ordinary ALU
Paritypredictor Parity prediction is an alternative to duplication Compared to duplication: Parity prediction often involves less overhead in time and space The protection offered by parity prediction is not as comprehensive
Nov. 2006^
Hardware Implementation Strategies
Slide 19
Encode the control signals using a separable code (e.g., Berger code) Either check in every cycle, or form a signature over multiple cycles
Microprogrammemory or PLA
op (frominstructionregister)^
Address Control signals to data path MicroPC 1 Incr
Data
0
Sequencecontrol
0 1 2 3 Dispatchtable 1 Dispatchtable 2
Microinstruction register
In a microprogrammed control unit, store the microinstruction addressand compare against MicroPC contents to detect sequencing errors
Checkbits
Nov. 2006^
Hardware Implementation Strategies
Slide 20
Watchdog unit monitors the instructionsexecuted and their addresses (for example, by snooping on the bus)
Instructionsequencer
Control-flow Watchdog
The watchdog unit may have certain info about program behavior^ Control flow graph (valid branches and procedure calls)^ Signatures of branch-free intervals (consecutive instructions)^ Valid memory addresses and required access privileges In an application-specific system, watchdog info is preloaded in it For a GP system, compiler can insert special watchdog directives Overheads of control-flow checking^ Wider memory due to the need for tag bits to distinguish word types^ Additional memory to store signatures and other watchdog info^ Stolen processor/bus cycles by the watchdog unit