Hardware Implementation Strategies for Fault-Tolerant Computing: Methods and Techniques - , Study notes of Electrical and Electronics Engineering

This presentation covers various hardware implementation strategies for fault-tolerant computing, including replication in space and time, self-checking designs, and monitoring methods. The document also discusses the multilevel model of dependable computing and the main drawbacks and advantages of different tolerance/recovery methods.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-lzr
koofers-user-lzr 🇺🇸

9 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Nov. 2006 Hardware Implementation Strategies Slide 1
Fault-Tolerant Computing
Hardware
Design
Methods
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Hardware Implementation Strategies for Fault-Tolerant Computing: Methods and Techniques - and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Nov. 2006^

Hardware Implementation Strategies

Slide 1

Fault-Tolerant Computing^ HardwareDesignMethods

Nov. 2006^

Hardware Implementation Strategies

Slide 2

About This Presentation Edition^

Released

Revised

Revised

First^

Oct. 2006 This presentation has been prepared for the graduatecourse ECE 257A (Fault-Tolerant Computing) byBehrooz Parhami, Professor of Electrical and ComputerEngineering at University of California, Santa Barbara.The material contained herein can be used freely inclassroom teaching or any other educational setting.Unauthorized uses are prohibited. © Behrooz Parhami

Nov. 2006^

Hardware Implementation Strategies

Slide 4

Nov. 2006^

Hardware Implementation Strategies

Slide 5

Multilevel Model of Dependable Computing^ Component

Logic

Service^

Result

Information

System

Level^ →

Low-Level Impaired

Mid-Level Impaired

High-Level Impaired

Unimpaired

Entry Legend:^

Deviation

Remedy^

Tolerance

Ideal

Defective

Faulty

Erroneous

Malfunctioning

Degraded

Failed

Nov. 2006^

Hardware Implementation Strategies

Slide 7

Replication of Data-Path Elements in Space

Pair-and-spare

VS 1 2 3 4

Switch-voter Spare V 1 2 3 C Voter 1 2^ Comparator

Error

1 Error^ C 2^ Comparators^ C 12 ′^

Switch^ S Error NMR/Hybrid

The following schemes havealready been discussed inconnection with fault tolerance^ Duplicate and compare^ Triplicate and vote

Nov. 2006^

Hardware Implementation Strategies

Slide 8

Main Drawback of Replication in Time Can be slow, but in many control applications, extra time is available Interleaving of the primary andduplicate computations saves time

+^ +^ ×^ Schedulewith 1 adder

+^

+ × +

Duplicatecomputation+^ ×

Duplicate^ +^ × computation

t^0 t + 1 (^0) t + 2 0 Computation flowgraph,and schedule with 2 adders

Nov. 2006^

Hardware Implementation Strategies

Slide 10

Alternating Logic: Basic Ideas

Transmission of data over unreliable wires or buses^ Send data; store at receiving end^ Send bitwise complement of data^ Compare the two versions^ Detects wires s-a-0 or s-a-1, as well as many transients The^ dual

of a Boolean function

f ( x ,^ x ,... ,^12

x^ ) is another function n

f ( x ,^ x ,... ,d^12

x^ ) such that n

f ( x ′,^ x d^1

′,... ,^2

x^ ′) =^ f^ ′( n

x ,^ x ,... ,^1

x^ ) n

Fact: Obtain the dual of

f^ by exchanging AND and OR operators in its

logical expression. For example, the dual of

f^ =^ ab^ ∨

c^ is^ f = (d^

a^ ∨^ b ) c

f f d InputsCompl.inputs

Output Error

Advantages of thisapproach comparedto duplication includea smaller probabilityof common errors

Nov. 2006^

Hardware Implementation Strategies

Slide 11

Alternating Logic: Self-Dual Functions A function^ f^ is self-dual if

f ( x ,^ x ,... ,^12

x^ ) =^ f ( n d

x ,^ x ,... ,^12

x ) n

With a self-dual function

f , the functions

f^ and^ f d

in the diagram above

can be computed by using the same circuit twice (time redundancy)

f f d InputsCompl.inputs

Output Error

For example, both thesum^ a^ ⊕

b^ ⊕^ c^ and carry^ ab

∨^ bc^ ∨^

ca outputs of a full-adderare self-dual functions Many functions of practical interest are self-dual

Use same circuit twice

Examples

(proofs left as exercise) A^ k -bit binary adder, with 2

k^ + 1 inputs and

k^ + 1 outputs, is self-dual

So are 1’s-complement and 2’s-complement versions of such an adder

Nov. 2006^

Hardware Implementation Strategies

Slide 13

Time-Redundant, Segmented Addition Instead of using a

k -bit adder twice for error detection or 3 times for error correction, one can segment the operands into 2 or 3 parts andsimilarly segment the adder; perform replicated addition on operandsegments and use comparison/voting to detect/correct error

C FF

c out^ Error Lower halfof adder Upper halfof adder

Comparator

x L x H y L y H

c in Various other segmentationschemes have been suggested Example:

16-bit adder with 4-way segmentation and voting

Sum computed in two cycles:The lower half in cycle 1, and the upper half in cycle 2^ Townsend, Abraham, and Swartzlander, 2003

Nov. 2006^

Hardware Implementation Strategies

Slide 14

Mixed Space-Time Replication

Instead of duplicating the computation with no hardware change (slow)or duplicating the entire hardware (costly), we can add some hardware

to make the interleaved recomputations more efficientRecomputation with samehardware resources (

T^ = 5, excluding compare time) +^ +^ × Originalcomputation( T^ = 3)

Recomputationwith the inclusionof an extra adder( T^ = 3, excludingcompare time) +^

+ × +

+ ×

Duplicate^ +^ × computation

Consider the effectof including asecond adder

Nov. 2006^

Hardware Implementation Strategies

Slide 16

Activity Monitor

Watchdog unit monitors events occurring in,and activities performed by, the function unit(e.g., event frequency and relative timing)

Functionunit^

Activitymonitor

Observed behavior is compared against expected behavior The type of monitoring is highly application-dependent

Nov. 2006^

Hardware Implementation Strategies

Slide 17

Design with Parity Codes and Parity Prediction^ Operands and results are parity-encoded^ Parity is not preserved over arithmetic and logic operations

/ k / k

/ k

Parity-encodedinputs^

ALU

Parity-encodedoutput^ Errorsignal Paritygenerator Ordinary ALU

Paritypredictor Parity prediction is an alternative to duplication Compared to duplication: Parity prediction often involves less overhead in time and space The protection offered by parity prediction is not as comprehensive

Nov. 2006^

Hardware Implementation Strategies

Slide 19

Coding of Control Signals

Encode the control signals using a separable code (e.g., Berger code) Either check in every cycle, or form a signature over multiple cycles

Microprogrammemory or PLA

op (frominstructionregister)^

Address Control signals to data path MicroPC 1 Incr

Data

0

Sequencecontrol

0 1 2 3 Dispatchtable 1 Dispatchtable 2

Microinstruction register

In a microprogrammed control unit, store the microinstruction addressand compare against MicroPC contents to detect sequencing errors

Checkbits

Nov. 2006^

Hardware Implementation Strategies

Slide 20

Control-Flow Watchdog

Watchdog unit monitors the instructionsexecuted and their addresses (for example, by snooping on the bus)

Instructionsequencer

Control-flow Watchdog

The watchdog unit may have certain info about program behavior^ Control flow graph (valid branches and procedure calls)^ Signatures of branch-free intervals (consecutive instructions)^ Valid memory addresses and required access privileges In an application-specific system, watchdog info is preloaded in it For a GP system, compiler can insert special watchdog directives Overheads of control-flow checking^ Wider memory due to the need for tag bits to distinguish word types^ Additional memory to store signatures and other watchdog info^ Stolen processor/bus cycles by the watchdog unit