Understanding Fault-Tolerant Computing & Software Dependability - Prof. B. Parhami, Exams of Electrical and Electronics Engineering

This presentation covers the concepts of software reliability and redundancy, focusing on fault-tolerant computing and the differences between software and hardware. It discusses software unreliability causes, software aging, and software reliability models. The presentation also introduces software verification and validation methods and strategies for software flaw tolerance, such as n-version programming, masking redundancy, and self-checking design.

Typology: Exams

Pre 2010

Uploaded on 08/31/2009

koofers-user-3tv
koofers-user-3tv 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Nov. 2007 Software Reliability and Redundancy Slide 1
Fault-Tolerant Computing
Software
Design
Methods
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Understanding Fault-Tolerant Computing & Software Dependability - Prof. B. Parhami and more Exams Electrical and Electronics Engineering in PDF only on Docsity!

Nov. 2007

Software Reliability and Redundancy

Fault-Tolerant Computing^ SoftwareDesignMethods

Nov. 2007

Software Reliability and Redundancy

About This Presentation

Edition

Released

Revised

Revised

First

Nov. 2006

Nov. 2007

This presentation has been prepared for the graduatecourse ECE 257A (Fault-Tolerant Computing) byBehrooz Parhami, Professor of Electrical and ComputerEngineering at University of California, Santa Barbara.The material contained herein can be used freely inclassroom teaching or any other educational setting.Unauthorized uses are prohibited. © Behrooz Parhami

Nov. 2007

Software Reliability and Redundancy “We are neither hardware norsoftware; we are your parents.”

“I haven’t the slightest idea who he is.He came bundled with the software.”

“Well, what’s a piece of softwarewithout a bug or two?”

Nov. 2007

Software Reliability and Redundancy

Slide 5

Multilevel Model of Dependable Computing

Component

Logic^

Service

Result

Information

System

Level

Low-Level Impaired

Mid-Level Impaired

High-Level Impaired

Unimpaired

Entry Legend:

Deviation

Remedy

Tolerance

Ideal

Defective

Faulty

Erroneous

Malfunctioning

Degraded

Failed

Nov. 2007

Software Reliability and Redundancy

Software Development Life Cycle

Project initiationNeedsRequirementsSpecificationsPrototype designPrototype testRevision of specsFinal designCodingUnit testIntegration testSystem testAcceptance testField deploymentField maintenanceSystem redesignSoftware discard

Evaluation by both the developer and customerImplementation or programmingSeparate testing of each major unit (module)Test modules within pretested control structureCustomer or third-party conformance-to-specs testNew contract for changes and additional featuresObsolete software is discarded (perhaps replaced)

Software flaws may ariseat several points withinthese life-cycle phases

Nov. 2007

Software Reliability and Redundancy

What Does Software Reliability Mean?

Major structural and logical problems are removed very early in theprocess of software testing What remains after extensive verification and validation is a collection oftiny flaws which surface under rare conditions or particular combinationsof circumstances, thus giving software failure a statistical nature Software usually contains one or more flaws per thousand lines of code,with < 1 flaw considered good (linux has been estimated to have 0.1) If there are

f^ flaws in a software component, the hazard rate, that is, rate of failure occurrence per hour, is

kf , with

k^ being the constant of

proportionality which is determined experimentally (e.g.,

k^ = 0.0001)

Software reliability:

R ( t ) =

  • kft e

The only way to improve software reliability is to reduce the number ofresidual flaws through more rigorous verification and/or testing

Nov. 2007

Software Reliability and Redundancy

Slide 10

Software Error Models

Software flaw/bug

⇒^

Operational error

⇒^

Software-induced failure

“Software failure” used informally to denote any software-related problem Initialflaws^ Removing flaws, withoutgenerating new ones

Residual flaws Removed flaws

Start oftesting

Softwarerelease

flaws

Time

Initialflaws

Removed flaws

Start oftesting

Softwarerelease

Added flaws

Residualflaws

flaws

Time

New flaws introduced areproportional to removal rate

Rate of flaw removaldecreases with time

Nov. 2007

Software Reliability and Redundancy

Slide 11

Software Reliability Models and Parameters For simplicity, we focus on thecase of no new flaw generation

Initialflaws

Residual flaws Removed flaws

Start oftesting

Softwarerelease

flaws

Testingtime

Assume linearly decreasing flawremoval rate (

F^ = residual flaws,

τ^ = testing time, in months) dF

)/ d τ^ = –(

a^ –^ b

τ)

F (τ) =

F –^0

a τ^ (1 –

b τ/(

a ))

Example:

F (τ) = 130 – 30

τ(1 –

τ/16)

Hazard function z (τ) =

k ( F^0

-^ a τ

(1 –^

b τ/(

a )))

In our example, let

k^ = 0.

R ( t ) = exp(–0.000132(130 – 30

τ(1 –

τ/16))

t )

Assume testing for

τ^ = 8 months:

R ( t ) =

–0.00132 e

t

τ^

MTTF (hr) 0

Nov. 2007

Software Reliability and Redundancy

More on Software Reliability Models

Exponentially decreasing flaw removal rate is more realistic thanlinearly decreasing, since flaw removal rate never really becomes 0 How does one go about estimating the model constants? •^ Use handbook: public ones, or compiled from in-house data •^ Match moments (mean, 2

nd^ moment,.. .) to flaw removal data

Linearly decreasing flaw removal rate isn’t the only option in modeling Constant flaw removal rate has also been considered, but it does notlead to a very realistic model •^ Least-squares estimation, particularly with multiple data sets •^ Maximum-likelihood estimation (a statistical method)

Nov. 2007

Software Reliability and Redundancy

Software Verification and Validation

Verification:

“Are we building the system right?” (meets specifications)

Validation:

“Are we building the right system?” (meets requirements) Both verification and validation use testing as well as formal methods Software testing Exhaustive testing impossible Test with many typical inputs Identify and test fringe cases

Formal methods Program correctness proof Formal specification Model checking

Example: overlap of rectangles

Examples: safety/security-critical^ Smart cards^ [Requet 2000]

Cryptography device

[Kirby 1999]

Railwayinterlockingsystem [Hlavaty 2001]

Automatedlab analysistest equipment [Bicarregui 1997]

Nov. 2007

Software Reliability and Redundancy

Software Flaw Tolerance

Given that a complex piece of software will contain bugs, can we useredundancy to reduce the probability of software-induced failures?^ Sources:

Software Fault Tolerance

, ed. by M.R. Lyu, Wiley, 2005 (on-line book at

http://www.cse.cuhk.edu.hk/~lyu/book/sft/index.html

)

Flaw avoidance strategies include (structured) design methodologies,software reuse, and formal methods The ideas of masking redundancy, standby redundancy, and self-checking design have been shown to be applicable to software,leading to various types of fault-tolerant software “Flaw tolerance” is a better term; “fault tolerance” has been overused Masking redundancy: N-version programming Standby redundancy: the recovery-block scheme Self-checking design: N-self-checking programming^ Also, “Software Fault Tolerance: A Tutorial,” 2000 (NASA report, available on-line)

Nov. 2007

Software Reliability and Redundancy

N-Version Programming: The Idea

Independently develop

N^ different programs (known as “versions”)

from the same initial specification The greater the diversity in the

N^ versions, the less likely

that they will have flaws that produce correlated errors Diversity in: Programming teams (personnel and structure) Software architecture Algorithms used Programming languages Verification tools and methods Data (input re-expression and output adjustment)

Version 1 Version 2 Version 3

Voter^

Output

Input

Adjudicator;Decider;Data fuser

Nov. 2007

Software Reliability and Redundancy

N-Version Programming: Reliability Modeling Fault-tree model: the version shownhere is fairly simple, but the power ofthe method comes in handy whencombined hardware/softwaremodeling is attempted Probabilities of coincident flawsare estimated from experimentalfailure data

Source: Dugan & Lyu, 1994 and 1995

Nov. 2007

Software Reliability and Redundancy

N-Version Programming: Applications

Back-to-back testing: multiple versions can help in the testing process^ Source: P. Bishop, 1995

Some experiments in N-version programming

B777 flight computer: 3 diverse processors running diverse software Airbus A320/330/340 flight control: 4 dissimilar hardware/softwaremodules drive two independent sets of actuators