





























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of slides from a university course on computer system analysis, specifically module 2, which focuses on combinatorial modeling methods for reliability analysis. The slides cover topics such as reliability definitions, failure rates, system reliability, reliability formalisms, and the reliability modeling process.
Typology: Study notes
1 / 37
This page cannot be seen from the preview
Don't miss anything!






























ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 1
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 2
techniques and can be used for reliability and availability modeling under
certain assumptions.
repairs are independent.
exist.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 4
and systems.
system operation is proper throughout the interval [0, t ].
models.
reliability of the component at time t is given by
X
( t ) = P [ X > t ] = 1 - P [ X ≤ t ] = 1 - F X
( t ).
X
( t ) = P [ X ≤ t ] = F X
( t ).
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 5
permission of the author.
What is the rate that a component fails at time t? This is the probability that a
component that has not yet failed fails in the interval ( t , t + Δt), as Δt → 0.
Note that we are not looking at P [ X ∈ ( t , t + Δ t )]. Rather, we are seeking
P [ X ∈ ( t , t + Δ t )| X > t ].
r X
( t ) is called the failure rate or hazard rate.
[ ( )]
( )
r t
F t
f t
F t
P X t t t
P X t
P X t t t X t P X t t t X t
X
X
X
X
this is a heurist
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 7
While F X
can give the reliability of a component, how do you compute the
reliability of a system?
System failure can occur when one, all, or some of the components fail. If one
makes the independent failure assumption , system failure can be computed quite
simply. The independent failure assumption states that all component failures of a
system are independent, i.e., the failure of one component does not cause another
component to be more or less likely to fail.
Given this assumption, one can determine:
Minimum failure time of a set of components
Maximum failure time of a set of components
Probability that k of N components have failed at a particular time t.
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 8
Maximum of n Independent Failure Times
Let X 1
n
be independent component failure times. Suppose the system fails
at time S if all the components fail.
Thus, S = max{ X 1
n
What is F s
( t )?
s
( t ) = P [ S ≤ t ]
1
≤ t AND X 2
≤ t AND... AND X n
≤ t ]
1
≤ t ] P [ X 2
≤ t ]... P [ X n
≤ t ] By independence
= By definition
1 2
F t F t F t X X Xn
∏
=
n
i
X
F t i 1
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 10
s
( t ) = P [ X 1
≤ t OR X 2
≤ t OR... OR X n
≤ t ]
1
t AND X 2
t AND... AND X n
t ] By trick
1
t ] P [ X 2
t ]... P [ X n
t ] By independence
1
≤ t ])(1 - P [ X 2
≤ t ])... (1 - P [ X n
≤ t ]) By LOTP
1
∏
=
n
i
X
F t i
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 11
Let X 1
n
be component failure times that have identical distributions (i.e.,
=.. .). The system has failed by time S if k or more of the N
components have failed by S
S
( t ) = P [at least k components failed by time t ]
= P [ exactly k failed OR exactly k + 1 failed OR... OR exactly N failed]
= P [exactly k failed] + P [exactly k + 1 failed] +... + P [exactly N failed]
What is P [exactly k failed]?
= P [ k failed and ( N - k ) have not]
where F X
( t ) is the failure distribution of each component.
Thus,
and axiom of
probability.
€
N
k
FX ( t )
k ( 1 − FX ( t ))
N − k
∑
=
− −
N
i k
N i
X
i
S X
F t F t
i
F ( t ) ( ) ( 1 ( ))
1 2
F t F t X X
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 13
Complex systems can be analyzed hierarchically.
Example: A computer fails if both power supplies fail or both memories fail or the
CPU fails.
System problem is one of a minimum : the system fails when the first of three
subsystems fails…proper formulation is
S
( t ) = 1 - (1 - F P 1
( t ) F P 2
( t )) (1- F M 1
( t ) F M 2
( t )) (1 - F C
( t ))
Probability at least 1 power source is up at t
Probability all 3 subsystems are up at t
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 14
A system comprises N components, where the component failure times are
given by the random variables X 1
N
. The system fails at time S with
distribution F S
if:
Condition:
all components fail
one component fails
k components fail,
identical distributions
k components fail,
general case
Distribution:
∏
=
N
i
S X
F t F t i 1
( ) ∏
=
N
i
S X
F t F t i 1
∑^ (^ )
=
− −
N
i k
N i
X
i
S X
F t F t
i
F ( t ) ( ) 1 ( )
( ) ( )^ ( 1 ( )) ∑ ∏ ∏
∈ ∈ ∉
g G k X g
X
X g
S X
F t F t F t
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 16
Series:
System fails if any component fails.
Parallel:
System fails if all components fail.
k of N :
System fails if at least k of N
components fail.
source C1^ C2^ C sink
source (^) sink
source (^) sink
2 of 3
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 17
A NASA satellite architecture under study is designed for high reliability. The
major computer system components include the CPU system, the high-speed
network for data collection and transmission, and the low-speed network for
engineering and control. The satellite fails if any of the major systems fail.
There are 3 computers, and the computer system fails if 2 or more of the computers
fail. Failure distribution of a computer is given by F C
There is a redundant (2) high-speed network, and the high-speed network system
fails if both networks fail. The distribution of a high-speed network failure is given
by F H
The low-speed network is arranged similarly, with a failure distribution of F L
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 19
computer
source (^) sink
2 of 3
2 2
3
2
3 ( ) 1 1 1
( ) 1 1 F t F t F t F t
i
F t H L
i
i
C
i
S C
=
−
computer
computer
Probability all three systems survive to t
ECE/CS 541: Computer System Analysis, Instructor William H. Sanders. ©2006 William H. Sanders. All rights reserved. Do not duplicate without Module2, Slide 20
computer
source (^) sink
2 of 3
2 2
3
2
3 ( ) 1 1 1
( ) 1 1 F t F t F t F t
i
F t H L
i
i
C
i
S C
=
−
computer
computer
Probability low speed network survives to t