



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Formula Sheet for Probability....Formula Sheet for Probability.....Formula Sheet for Probability
Typology: Study Guides, Projects, Research
1 / 5
This page cannot be seen from the preview
Don't miss anything!




On special offer
1
2
1
2
1
2
1
2
1
2
Pr{𝐴} = Pr{𝐴|𝐵
1
} Pr{𝐵
1
} + Pr{𝐴|𝐵
2
} Pr{𝐵
2
𝑁
𝑁
𝑖
𝑖
𝑖
𝑗
𝑗
𝑁
𝑗= 1
𝑥
𝑘
𝑥
𝑘
P1,P2,P3 same as F1,F2,F3 given below **
𝑋
𝑖+ 1
𝑋
𝑖
𝑋
𝑖+ 1
𝑋
𝑥
𝑥∈𝑆 𝑘
Pr{𝑋 = 𝑘} = (
𝑘
𝑛−𝑘
2
Pr{𝑍 = 𝑘} = 𝑝( 1 − 𝑝)
𝑘− 1
2
2
Pr
= Pr{𝑧 = 𝑗}
Counts total # of arrivals
Pr{𝑁 = 𝑘} =
𝑘
−𝜆
2
λis arrivals/unit time
𝑋
𝑋
∞
−∞
𝑋
could be > 1 but ∫ 𝑓
𝑋
∞
−∞
F3: Pr{𝑎 ≤ 𝑋 ≤ 𝑏} = ∫ 𝑓
𝑋
𝑏
𝑎
𝑋
= Pr
𝑋
𝑡
𝑥= −∞
𝑋
𝑋
𝑋
𝑋
𝑋
𝐹 3 : lim
𝑥→−∞
𝑋
= 0 , lim
𝑥→∞
𝑋
𝐹 4 : Pr
𝑋
𝑎
𝑎
For very small Value of ϵ
Pr {𝑎 −
𝑋
𝑎+
𝜖
2
𝑎−
𝜖
2
𝑋
𝐹 5 : Pr
𝑋
𝑋
expected #of successes in n Bernoulli trials
If X , Y are independent random variables,
2
𝑋
𝑋
∞
−∞
𝑋
𝑋
∞
−∞
𝑋
2
𝑋
2
𝑋
∞
−∞
𝑋
2
2
] − 𝐸
2
( 2nd moment − 1st moment
2
)
𝑋
𝑋
also called the Rectangular distribution
2
Measure inter arrival time b/w events
2
2
Derived from poisson:
𝑋
−𝜆𝑡
𝑋
−𝜆𝑡
Exponential RV is a limiting case of a geometric RV
Like the geometric RV, the exponential RV also possess
the memoryless property
𝑋
−𝛼|𝑥|
2
2
For systems with r different stages PDF and CDF:
𝑅
𝑟
𝑟− 1
−𝜆𝑡
𝑅
𝑘
−𝜆𝑡
𝑟
𝑘= 1
Replacingrbyα(non - integer)
𝛤
𝛼
𝛼− 1
−𝜆𝑡
𝑤ℎ𝑒𝑟𝑒: 𝛤
( 𝛼
) = ∫ 𝑥
𝛼− 1
𝑒
−𝑥
𝑑𝑥
∞
0
𝛼 𝑎𝑛𝑑 𝜆 𝑎𝑟𝑒 𝑠ℎ𝑎𝑝𝑒𝑠 & 𝑠𝑐𝑎𝑙𝑒
2
2
Properties:
𝛼− 1
−𝜆𝑥
𝑑𝑥
∞
0
𝛼
Atα=1:
𝛤
−𝜆𝑡
𝑋
−
1
2
(
𝑥−𝜇
𝜎
)
2
standard normal distribution:
2
Gaussian Distribution & The Error Function (erf):
erf(𝑥) = 𝛷(𝑥) =
−
𝑡
2
2
𝑥
−∞
Properties of erf:
for large n, the Normal approximation of a binomial RV:
= Pr
exp{−
2
This approximation is also called the Laplace Approx.
Provided 𝑝 is not too close to 0 or 1.
power function 𝑌 = 𝑔(𝑋) = 𝑋
𝑛
𝑛
𝑖
𝑛
𝑥
𝑖
𝑖
𝑛
𝑥
∞
−∞
𝐸{𝑋
𝑛
} is called the nth moment of X
For two RVs X and Y if 𝐸{𝑋
𝑛
𝑛
} Then X and Y
have same distribution
𝑥
𝑛
𝑖
𝑥
𝑛
𝑥
𝑖
𝑖
𝑥
𝑛
𝑥
∞
−∞
Transform Domain Methods:
𝑋
𝑗𝜔𝑋
Characteristic Function of a Continuous RV:
𝑋
𝑋
𝑗𝜔𝑋
∞
−∞
CF is the fourier transform of the pdf of X(with sign
reversal)
Characteristic Function of a Discrete RV:
𝛷 𝑋
( 𝜔
) = ∑ 𝑝 𝑋
(𝑥 𝑘
)
𝑘
𝑒
𝑗𝜔𝑥 𝑘
If the discrete RV takes integer values then:
𝑋
𝑋
𝑘
𝑗𝜔𝑘
The CF of integer-valued discrete RV is a periodic function
ofω.
The PMF of a discrete RV can be derived from its
characteristic function as:
𝑋
−𝑗𝜔𝑘
2 𝜋
0
𝑋
Characteristic functions have useful property that
the moments of a RV X can be computed by
differentiating this function w.r.t 𝜔 and evaluated
at 𝜔 = 0.
𝑛
𝑛
𝑛
𝑛
𝑋
𝜔= 0
So it is also called the Moment Generating
Function.
for a non-ive, discrete, integer, RV define (PGF) as:
𝑁
𝑛
𝑘
𝑁
∞
𝑘= 0
PGF is the z transform of the PMF with sign
reversal.
PGF can be used to generate the probabilities of
the integer RV as:
𝑁
𝑛
𝑛
𝑁
𝑧= 0
PGF can also be used to evaluate the moments of
the integer RV.
𝑁
𝑧= 1
𝑘− 1
𝑁
∞
𝑘= 0
Evaluating 1
st
derivative at z=1 gives the 1
st
moment.
𝑁
𝑧= 1
𝑁
′
( 1
𝑁
∞
𝑘= 0
Evaluating 2
nd
derivative at z=1 gives a 2nd
moment.
2
2
𝑁
𝑧= 1
2
2
𝑁
′
We can also compute the variance as well:
2
] − 𝐸
2
𝑁
′′
𝑁
′
𝑁
′
2
Pr{𝑋 ≥ 𝑡} ≤
Pr{|𝑋 − 𝜇| ≥ 𝑡} ≤
2
2
Markov and Chebychev Inequalities apply to All RVs!
The joint cdf of a pair of rvs X and Y is defined as:
𝑋,𝑌
= Pr
= Pr
Joint distributions are also called compound distributions
𝑋,𝑌
1
2
1
2 ,
𝑋,𝑌
1
1
𝑋,𝑌
1
1
𝑋,𝑌
𝑋,𝑌
𝑎 & 𝑏 → ∞
𝑋,𝑌
𝑎 𝑜𝑟 𝑏 →−∞
𝑋,𝑌
𝑎 → ∞
𝑌
𝑋,𝑌
𝑏 →∞
𝑋
𝐹 6 : Pr{𝑎 < 𝑋 ≤ 𝑏 & 𝑐 < 𝑌 ≤ 𝑑}
𝑋,𝑌
𝑋,𝑌
𝑋,𝑌
𝑋,𝑌
𝑋,𝑌
2
𝜕𝑥𝜕𝑦
𝑋,𝑌
𝑋,𝑌
𝑋,𝑌
𝑦
−∞
𝑥
−∞
Pr
𝑋,𝑌
𝑑
𝑐
𝑏
𝑎
Point selected uniformly from area, then XY have joint pdf
𝑋,𝑌
𝐴𝑟𝑒𝑎
joint pdf must satisfy the following property:
𝑋,𝑌
∞
−∞
∞
−∞
𝑋
𝑋,𝑌
∞
−∞
𝑌
𝑋,𝑌
∞
−∞
Two random variables X and Y are independent if:
𝑋,𝑌
𝑋
𝑌
𝑋,𝑌
𝑋
𝑌
𝑋,𝑌
𝑋
𝑌
𝑌
𝑋,𝑌
𝑦
−∞
𝑋
𝑌
𝑌
𝑋,𝑌
𝑋
𝑋,𝑌
𝑌
𝑋
𝑍|𝑋
𝑌
𝑦=𝑔(𝑦)
− 1 , 𝑓 𝑍|𝑌
𝑋
𝑥=𝑔(𝑥)
− 1
For a discrete RV, the same condition can be stated as:
𝑋,𝑌
= Pr
𝑌
𝑋
∞
−∞
𝑌
𝑦
𝑌
𝑘
𝑘
Joint pmf of n discrete random variables is:
𝑋 1
,𝑋 2
,…𝑋 𝑛
1 ,
2 ,
𝑛
= Pr{𝑋
1
1
𝑛
𝑛
Conditional pmfs are obtained as
𝑋 𝑛
𝑛
1 ,
𝑛− 1
𝑋
1
,𝑋
2
,…𝑋
𝑛
1 ,
2 ,
𝑛− 1 ,
𝑛
𝑋 1
,𝑋 2
,…𝑋 𝑛
1 ,
2 ,
𝑛− 1
Conditional pdf from joint pdfs is:
𝑋
𝑛
𝑛
1 ,
𝑛− 1
𝑋 1
,𝑋 2
,…𝑋 𝑛
1 ,
2 ,
𝑛− 1 ,
𝑛
𝑋
1
,𝑋
2
,…𝑋
𝑛
1 ,
2 ,
𝑛− 1
Repeatedly applying this expression gives:
𝑋 1
,𝑋 2
,…𝑋 𝑛
1 ,
2 ,
𝑛− 1 ,
𝑛
𝑋 𝑛
𝑛
1 ,
𝑛− 1
𝑋 𝑛− 1
𝑛− 1
1 ,
𝑛− 2
𝑋
2
2
1
𝑋
1
1
Marginal pmf of a RV is obtained by summing over the
images of all other RVs
𝑋 1
1
1
1
𝑥
2
𝑋 1
,𝑋 2
,…𝑋 𝑛
1
2
𝑛
𝑥
𝑛
Joint CDF of n continuous RVs is:
𝑋
1
,𝑋
2
,…𝑋
𝑛
1
2
𝑛
𝑋
1
,𝑋
2
,…𝑋
𝑛
1
2
𝑛
𝑛
1
𝑥
𝑛
−∞
𝑥
2
−∞
𝑥
1
−∞
Conversely, joint pdf is then obtained as
𝑋 1
,𝑋 2
,…𝑋 𝑛
1
2
𝑛
2
1
2
𝑋 1
,𝑋 2
,…𝑋 𝑛
1
2
𝑛
A single marginal pdf can be obtained as:
𝑋
1
1
𝑋
1
,𝑋
2
,…𝑋
𝑛
1
2
𝑛
𝑛
2
∞
−∞
∞
−∞
A marginal pdf for a sub-vector RV can be obtained as:
𝑋 1
,𝑋 2
,…𝑋 𝑛
1
2
𝑛− 1
𝑋 1
,𝑋 2
,…𝑋 𝑛
1
2
𝑛− 1
𝑛
𝑛
∞
−∞
𝑋,𝑌
∞
−∞
∞
−∞
𝑖
𝑛
𝑋,𝑌
𝑖
𝑛
𝑖 𝑛
The jk-th joint moment of two RVs, X and Y , is given as
𝑗
𝑘
𝑗
𝑘
𝑋,𝑌
∞
−∞
∞
−∞
𝑖
𝑗
𝑛
𝑘
𝑋,𝑌
𝑖
𝑛
𝑖 𝑛
By setting j=0 , we can obtain moments of Y
Similarly, k=0 yields moments of X
The (j=1, k=1) moment, E{XY} , is generally called the
correlation of X and Y.
if E{XY}=0, then X and Y are orthogonal or uncorrelated
The jk-th central moment of two RVs, X and Y, is:
𝑗
𝑘
By setting j=0, k=2 gives variance of Y
Similarly, j=2, k=0 gives variance of X
The (j=1, k=1) central moment, is generally called the c
covariance of X and Y.
𝐶𝑂𝑉
{ 𝑋, 𝑌
} = 𝐸
{( 𝑋 − 𝐸
{ 𝑋
})( 𝑌 − 𝐸
{ 𝑌
})}
𝐶𝑂𝑉
{ 𝑋, 𝑌
} = 𝐸
{ 𝑋𝑌
} − 𝐸
{ 𝑋
} 𝐸{𝑌}
𝑋,𝑌
𝑋
𝑌
𝑋
𝑌
𝑋,𝑌
The correlation coefficient is a normalized measure that
quantifies the amount of dependence between two RVs.
The correlation coefficient is a measure of the degree to which
a linear relationship exists between two RVs.
1
2
𝑛
1
2
𝑛
If all 𝑋
𝑖
s are independent:
1
2
𝑛
1
2
𝑛
𝑋 1
1
2
𝑁
𝑁/ 2
1 / 2
−
1
2
(𝑥̅ −𝑚̅ )
𝑇
∑
− 1
(𝑥̅ −𝑚̅ )
1
2
𝑁
1
2
12
1 𝑁
21
𝑁 1
2
2
2 𝑁
𝑁 2
𝑁
2
1
2
1
2
12
21
2
2
𝑋,𝑌
𝑋
𝑌
12
𝑋 1
, 𝑋 2
1 , 2
1
2
1 , 2
2 , 1
1
2
1 , 2
1
2
1 , 2
2
1
2
2
− 1
2
1
1
2
2
1
2
𝑓 𝑋,𝑌
( 𝑥, 𝑦
)
=
exp{
− 1
2 ( 1 − 𝜌
𝑋,𝑌
2
)
[(
𝑥 − 𝑚 𝑥
𝜎 𝑋
)
2
− 2 𝜌 𝑋,𝑌
(
𝑥 − 𝑚 𝑥
𝜎 𝑋
)(
𝑦 − 𝑚 𝑦
𝜎 𝑌
)(
𝑦 − 𝑚 𝑦
𝜎 𝑌
)
2
]}
2 𝜋𝜎 𝑋
𝜎 𝑌
√ 1 − 𝜌 𝑋,𝑌
2
If we set the exponents involving x and y in the
above
expression to a constant k , we obtain the equation
for an ellipse:
𝑓 𝑋,𝑌
( 𝑥, 𝑦
exp{
− 1
2 ( 1 − 𝜌
𝑋,𝑌
2
)
𝐾}
2 𝜋𝜎
𝑋
𝜎
𝑌
√ 1 − 𝜌
𝑋,𝑌
2
RVsX1,X2,…Xnarejointlynormaliftheirpdfhas
the following form:
𝑋
̅
𝑋
1
,𝑋
2
,…,𝑋
𝑛
1
2
𝑛
𝑛/ 2
1 / 2
−
1
2
(𝑥̅ −𝑚̅ )
𝑇
∑
− 1
(𝑥̅ −𝑚̅ )
Where: 𝑥̅ =
1
2
𝑛
And 𝑚̅ = [
1
2
1
2
𝜎
12
1 𝑁
21
𝑁 1
2
2
2 𝑁
𝑁 2
𝑁
2
𝑉,𝑊
𝑋,𝑌
1
2
𝑉,𝑊
𝑋,𝑌
1
2
Where: |𝐽(𝑥, 𝑦)| = 𝑑𝑒𝑡 [
𝜕𝑣
𝜕𝑥
𝜕𝑣
𝜕𝑦
𝜕𝑤
𝜕𝑥
𝜕𝑤
𝜕𝑦
] , |𝐽(𝑣, 𝑤)| = 𝑑𝑒𝑡 [
𝜕𝑥
𝜕𝑣
𝜕𝑥
𝜕𝑤
𝜕𝑦
𝜕𝑣
𝜕𝑦
𝜕𝑤
]
𝑛→∞
If we run a large number of Bernoulli trials (n) then the
probability that the proportion of successes in the n trials
differs from p is arbitrarily small.
𝑛→∞
If you conduct a large number of trials (n), then the
probability that the sample mean (𝑋
̅
) deviates from the
true mean (μ) (by
more than a small value (δ) is.
𝑛→∞
If you conduct a large number of trials (n), then the
probability that the sample mean (𝑋
̅ ) converges to the
true mean (μ) is 1.
The Sample Mean Version
𝑑
The Sample Sum Version
𝑆
𝑛
− 𝑛𝜇
𝑑
i
}= μ i
and a finite variance Var{X i
} = σ i
2
𝑛
𝑋
𝑖
𝑛
𝑖= 1
−
𝜇
𝑖
𝑛
𝑖= 1
𝑛
𝑖= 1 𝑖
2
Since the random walk process is an IID Sum RP:
𝐷 𝑛
𝐷 𝑛
𝑆
𝐷
1
2
= min(𝑛 1
2
The Wiener Process is sum of a very large number of IID
RVs
Therefore, according to the central limit theorem, the
Wiener Process has a Gaussian PDF:
𝑋
( 𝑡
)
−
𝑥
2
2 𝛼𝑡
Any RP having a Gaussian PDF is called a Gaussian
Random Process.
The Wiener Process has independent and stationary
increments.
𝑋(𝑡 1
),…,𝑋(𝑡 𝑘
)
1
𝑘
𝑋
( 𝑡 1
)
1
𝑋
( 𝑡 2
−𝑡 1
)
1
𝑋
( 𝑡 𝑘
−𝑡 𝑘− 1
)
𝑘
𝑘− 1
1
2
1
2
1
2
2
1
𝑘
𝑘− 1
2
𝑘
𝑘− 1
𝑘
1
2
1
𝑘
𝑘− 1
Since the Wiener Process is a zero-mean IID Sum RP, its
covariance is:
𝑋
1
2
𝑋
1
2
) = min(𝑡
1
2
A Poisson Random Process, N(t) , is the number of
occurrences or arrivals of an event A in the [0,t] time
interval.
Like the Poisson RV, the Poisson RP assumes that:
→The average number of arrivals per unit time (e.g., per
second), 𝜆 , is known
→ The arrivals are independent of each other.
If we observe 𝜆 arrivals per second, 𝜆𝑡 arrivals (on-
average) in t seconds that we should expect in the [0, t]
time interval (i.e. t seconds).
Thus the Poisson RP has a Poisson pmf with parameter
Pr{𝑁(𝑡) = 𝑘} =
𝑘
−𝜆𝑡
Recall that Poisson distribution is derived from a
Binomial distribution by dividing the [0,1] interval into
very (infinitely) small sub-interval.
Then an arrival either takes place in a sub-interval or it
does not
→ Which can be treated as a Bernoulli random variable
The Poisson RP is then counting the number of successes
(arrivals).
Thus the Poisson RP is the continuous-time counterpart
of the Binomial RP.
Like the Binomial RP, the Poisson RP also has
independent and stationary increments.
Pr{𝑁(𝑡
1
1
2
2
= Pr{𝑁(𝑡
1
1
}Pr{𝑁(𝑡
2
1
2
1
Since the Poisson RP is an IID Sum RP, its covariance is
given by:
𝑁
1
2
) = min(𝑡
1
2
Recall that the inter-arrival times of Poisson arrivals are
exponential random variables.
Therefore, the sum of the inter-arrival times of the
Poisson RP is a sum of independent exponential
distributions.
We know that the sum of independent exponential
distributions has an Erlang PDF.
Therefore, the sum of inter-arrival times
𝑛
1
2
𝑛
is given as:
𝑆 𝑛
𝑛− 1
−𝜆𝑦
A (time) homogeneous Markov Chain is one in which the
state transition probabilities are independent of time, i.e.
Homogeneous
𝑖𝑗
= Pr
𝑘+ 1
𝑘
= Pr
𝑘+𝑛+ 1
𝑘+𝑛
Non-homogeneous
Pr{𝑋
𝑘+ 1
𝑘
= 𝑖} ≠ Pr{𝑋
𝑘+𝑛+ 1
𝑘+𝑛
An absorbing state of a Markov chain is a state which
cannot be transitioned out of.
𝑖𝑖
𝑖𝑗
= 0 ) 𝑖 ≠ 𝑗 , then 𝑖 is an absorbing state.
If every state in a Markov chain can reach an absorbing
state, the Markov chain is called an absorbing Markov
chain.
A state 𝑖 of a Markov chain is a recurrent state if,
𝑖𝑖
𝑛
∞
𝑛= 1
In words, if you start from a recurrent state you are
guaranteed to revisit it eventually.
A state of a Markov chain is a transient state if,
𝑖𝑖
𝑛
∞
𝑛= 1
In word, a transient state is the converse of the
recurrent state.
Given the state transition probability matrix (𝑃 ) and an
arbitrary starting pmf at time , is it possible to
compute the pmf of states at any arbitrary time 𝑡 + 𝑛?
Given Starting pmf at time 𝑡 = 0.
(𝑛)
( 0 )
𝑛
A state 𝑖 has period 𝑘 if any return to state 𝑖 must occur
in multiples of 𝑘 time steps. Formally, the period of a
state is defined as:
𝑘 = gcd{𝑛: Pr{𝑋
𝑛
0
Even though a state has period 𝑘 , it may not be possible
to reach the state in 𝑘 steps.
If 𝑘 = 1 , then the state is said to be aperiodic: returns
to state i can occur at irregular times.
A Markov chain is aperiodic if all its states are periodic.
A state i is said to be ergodic if it is aperiodic and
positive recurrent.
State i is ergodic if it is recurrent, has a period of 1 and
it has finite mean recurrence time.
If all states in an irreducible Markov chain are ergodic,
then the Markov chain is ergodic.
A Markov Chain is said to be irreducible if its state
space is a single communicating class.
If the Markov chain is a time-homogeneous Markov
chain, so that the process is described by a single, time-
independent matrix 𝑃
𝑖𝑗
, then the vector 𝜋 is called a
stationary distribution (or invariant measure) if its
entries are non-negative and sum to 1 and if it satisfies:
𝑗
𝑖
𝑖𝑗
1
2
𝑁
𝑇
Or in matrix-vector form:
𝑇
𝑇
𝑖𝑗
( 2 )
𝑖𝑘
( 1 )
𝑘𝑗
( 1 )
𝑆
𝑘= 1
InMatrixform…
( 2 )
Can we generalize this expression for n-steps, to
compute 𝑝 𝑖𝑗
(𝑛)
𝑖𝑗
(𝑛)
𝑖𝑘
(𝑛−𝑚)
𝑘𝑗
(𝑚)
𝑆
𝑘= 1
Or,inmorefriendlyform…
𝑖𝑗
(𝑛+𝑚)
𝑖𝑘
(𝑛)
𝑘𝑗
(𝑚)
𝑆
𝑘= 1
And in Matrix form:
(𝑛+𝑚)
(𝑛)
(𝑚)
𝑛
𝑚
𝑛+𝑚
3
3
3
2
2
2