Computer System Analysis: Module 2 - Reliability Analysis with Combinatorial Models - Prof, Study notes of Computer Science

A set of slides from a university course on computer system analysis, specifically module 2, which focuses on combinatorial modeling methods for reliability analysis. The slides cover topics such as reliability definitions, failure rates, system reliability, reliability formalisms, and the reliability modeling process.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-a0p
koofers-user-a0p 🇺🇸

9 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Module2, Slide 1
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without
permission of the author.
Module 2: Combinatorial Modeling Methods
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Computer System Analysis: Module 2 - Reliability Analysis with Combinatorial Models - Prof and more Study notes Computer Science in PDF only on Docsity!

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 1

Module 2: Combinatorial Modeling Methods

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 2

Introduction to Combinatorial Methods

  • Combinatorial validation methods are the simplest kind of analytical/numerical

techniques and can be used for reliability and availability modeling under

certain assumptions.

  • Assumptions are that component failures are independent, and for availability,

repairs are independent.

  • When these assumptions hold, simple formulas for reliability and availability

exist.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 4

Reliability

  • One key to building highly available systems is the use of reliable components

and systems.

  • Reliability: The reliability of a system at time t ( R ( t )) is the probability that the

system operation is proper throughout the interval [0, t ].

  • Probability theory and combinatorics can be directly applied to reliability

models.

  • Let X be a random variable representing the time to failure of a component. The

reliability of the component at time t is given by

R

X

( t ) = P [ X > t ] = 1 - P [ Xt ] = 1 - F X

( t ).

  • Similarly, we can define unreliability at time t by

U

X

( t ) = P [ Xt ] = F X

( t ).

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 5

permission of the author.

Failure Rate

What is the rate that a component fails at time t? This is the probability that a

component that has not yet failed fails in the interval ( t , t + Δt), as Δt → 0.

Note that we are not looking at P [ X ∈ ( t , t + Δ t )]. Rather, we are seeking

P [ X ∈ ( t , t + Δ t )| X > t ].

r X

( t ) is called the failure rate or hazard rate.

[ ( )]

( )

[ ]

[ ( , ), ]

[ ( , )| ]

r t

F t

f t

F t

P X t t t

P X t

P X t t t X t P X t t t X t

X

X

X

X

this is a heurist

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 7

System Reliability

While F X

can give the reliability of a component, how do you compute the

reliability of a system?

System failure can occur when one, all, or some of the components fail. If one

makes the independent failure assumption , system failure can be computed quite

simply. The independent failure assumption states that all component failures of a

system are independent, i.e., the failure of one component does not cause another

component to be more or less likely to fail.

Given this assumption, one can determine:

  1. Minimum failure time of a set of components

  2. Maximum failure time of a set of components

  3. Probability that k of N components have failed at a particular time t.

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 8

Maximum of n Independent Failure Times

Let X 1

,... , X

n

be independent component failure times. Suppose the system fails

at time S if all the components fail.

Thus, S = max{ X 1

,... , X

n

What is F s

( t )?

F

s

( t ) = P [ St ]

= P [ X

1

t AND X 2

t AND... AND X n

t ]

= P [ X

1

t ] P [ X 2

t ]... P [ X n

t ] By independence

= By definition

1 2

F t F t F t X X Xn

=

n

i

X

F t i 1

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 10

Minimum cont.

F

s

( t ) = P [ X 1

t OR X 2

t OR... OR X n

t ]

= 1 - P [ X

1

t AND X 2

t AND... AND X n

t ] By trick

= 1 - P [ X

1

t ] P [ X 2

t ]... P [ X n

t ] By independence

= 1 - (1 - P [ X

1

t ])(1 - P [ X 2

t ])... (1 - P [ X n

t ]) By LOTP

= 1 (^1 ( ))

1

=

n

i

X

F t i

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 11

k of N

Let X 1

,... , X

n

be component failure times that have identical distributions (i.e.,

=.. .). The system has failed by time S if k or more of the N

components have failed by S

F

S

( t ) = P [at least k components failed by time t ]

= P [ exactly k failed OR exactly k + 1 failed OR... OR exactly N failed]

= P [exactly k failed] + P [exactly k + 1 failed] +... + P [exactly N failed]

What is P [exactly k failed]?

= P [ k failed and ( N - k ) have not]

where F X

( t ) is the failure distribution of each component.

Thus,

  • by independence

and axiom of

probability.

N

k

FX ( t )

k ( 1 − FX ( t ))

Nk

=

−  −

N

i k

N i

X

i

S X

F t F t

i

N

F ( t ) ( ) ( 1 ( ))

1 2

F t F t X X

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 13

Component Building Blocks

Complex systems can be analyzed hierarchically.

Example: A computer fails if both power supplies fail or both memories fail or the

CPU fails.

System problem is one of a minimum : the system fails when the first of three

subsystems fails…proper formulation is

  • Power supply subsystem is a maximum : both must fail
  • Memory subsystem is a maximum : both must fail

F

S

( t ) = 1 - (1 - F P 1

( t ) F P 2

( t )) (1- F M 1

( t ) F M 2

( t )) (1 - F C

( t ))

Probability at least 1 power source is up at t

Probability all 3 subsystems are up at t

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 14

Summary

A system comprises N components, where the component failure times are

given by the random variables X 1

,... , X

N

. The system fails at time S with

distribution F S

if:

Condition:

all components fail

one component fails

k components fail,

identical distributions

k components fail,

general case

Distribution:

=

N

i

S X

F t F t i 1

( ) ∏

=

N

i

S X

F t F t i 1

∑^ (^ )

=

−  −

N

i k

N i

X

i

S X

F t F t

i

N

F ( t ) ( ) 1 ( )

( ) ( )^ ( 1 ( )) ∑ ∏ ∏

∈ ∈ ∉

g G k X g

X

X g

S X

F t F t F t

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 16

Reliability Block Diagrams

  • Blocks represent components.
  • A system failure occurs if there is no path from source to sink.

Series:

System fails if any component fails.

Parallel:

System fails if all components fail.

k of N :

System fails if at least k of N

components fail.

source C1^ C2^ C sink

C

C

C

source (^) sink

C

C

C

source (^) sink

2 of 3

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 17

Example

A NASA satellite architecture under study is designed for high reliability. The

major computer system components include the CPU system, the high-speed

network for data collection and transmission, and the low-speed network for

engineering and control. The satellite fails if any of the major systems fail.

There are 3 computers, and the computer system fails if 2 or more of the computers

fail. Failure distribution of a computer is given by F C

There is a redundant (2) high-speed network, and the high-speed network system

fails if both networks fail. The distribution of a high-speed network failure is given

by F H

The low-speed network is arranged similarly, with a failure distribution of F L

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 19

RBD Example

computer

source (^) sink

HSN LSN

LSN

2 of 3

2 2

3

2

3 ( ) 1 1 1

( ) 1 1 F t F t F t F t

i

F t H L

i

i

C

i

S C

=

HSN

computer

computer

Probability all three systems survive to t

ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 20

RBD Example

computer

source (^) sink

HSN LSN

LSN

2 of 3

2 2

3

2

3 ( ) 1 1 1

( ) 1 1 F t F t F t F t

i

F t H L

i

i

C

i

S C

=

HSN

computer

computer

Probability low speed network survives to t