








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A review of basic probability theory, including definitions, independence, probability distributions, random variables, and conditional probability. It also covers the law of total probability and variance, with examples. a handout for the course 6.1600 at Massachusetts Institute of Technology in Fall 2022, taught by Henry Corrigan-Gibbs, Yael Kalai, and Nickolai Zeldovich.
Typology: Lecture notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Foundations of Computer Security September 7, 2022 Massachusetts Institute of Technology 6.1600 Fall 2022 Henry Corrigan-Gibbs, Yael Kalai, Nickolai Zeldovich Handout 2
Let Ω be the set of all possible outcomes of a (discrete) random experiment. We call Ω the sample space of the experiment. For example, suppose our random experiment consists of flipping a fair coin n times independently. Then we can represent Ω as
Ω = {(a 1 ,... , an) : ai ∈ { 0 , 1 }}
where we encode heads as 1 and tails as 0. A probability distribution over Ω is a function p : Ω → R≥ 0 such that
x∈Ω
p(x) = 1. An event
is any set A ⊆ Ω, and the probability of this event is Pr p
x∈A
p(x). We will often just
write Pr instead of Pr p when the distribution p is clear from context. Two events A, B ⊆ Ω
are called independent, if Pr[A ∩ B] = Pr[A] Pr[B]. In words, we can define the probability of an event in a uniform distribution as
Pr[event happens] =
number of ways it can happen total number of outcomes
In our example, the event that the first flip is heads is represented as the set
A 1 , 1 = {(1, a 2 ,... , an) : ai ∈ { 0 , 1 }}
and similarly the event that the first flip is tails is
A 1 , 0 = {(0, a 2 ,... , an) : ai ∈ { 0 , 1 }}
We can similarly define the events Ai, 1 for the i-th flip to be heads, and Ai, 0 for tails. Since the coin flips are independent, and since the coin is fair, we have that
p((a 1 ,... , an)) = Pr [A 1 ,a 1 ∩... ∩ An,an ]
= Pr [A 1 ,a 1 ]... Pr [An,an ]
=
2 n^
A (real-valued) random variable is a function X : Ω → R. In our example, the number of heads is a random variable represented by the function
X((a 1 ,... , an)) =
X^ n
i=
ai
Two discrete real-valued random variables X, Y are called independent if
Pr [X = x, Y = y] = Pr [X = x] Pr [Y = y]
for any x, y ∈ R. The random variables X 1 ,... , Xn are called (jointly) independent if
Pr [X 1 = x 1 ,... , Xn = xn] = Pr [X 1 = x 1 ]... Pr [Xn = xn]
for any x 1 ,... , xn. Note that the variables X 1 ,... , Xn can be pairwise independent with- out being jointly independent! In our example, letting Xi be the random variable that is 1 if the i-th coin landed heads and 0 otherwise (i.e., Xi((a 1 ,... , an)) = ai), the variables X 1 ,... , Xn are jointly independent.
The law of total probability states that if we have events A 1 , A 2 ,... , An which partition the sample space (i.e., Ω is a disjoint union of these events), and B is any event, then
Pr [B] =
X^ n
i=
Pr [B ∩ Ai].
The law of total probability is also valid if we have a countably infinite partition into events A 1 , A 2 ,... , An,.. ., in which case
Pr [B] =
i=
Pr [B ∩ Ai].
Conditioning on something means assuming with certainty that this thing will happen. Formally, the probability of event A conditioned on event B is defined as
Pr[A|B] =
Pr[A ∩ B] Pr[B]
or, in words, the probability that both events happen, divided by the probability that B happens. The intuition is that we focus only on the part of our sample space Ω on which
For a discrete real-valued random variable X taking possible values x 1 ,... , xn, the expec- tation is defined as
X^ n
i=
Pr [X = xi] xi
Linearity of Expectation Given random variables X 1 , ..., Xn and X =
Pn i=1 Xi, we have
" (^) n X
i=
Xi
X^ n
i=
E [Xi]
In words, the expected value of the sum of random variables is equal to the sum of the expected values. A very important takeaway from this result is that it holds even if the random variables are not independent. This will be used frequently when we have to find the expected value of a sum of random variables when they might not be independent.
Multiplicativity of expectation under independence Another cool property of expectation is that the expectation of a product of independent variables is the product of individual expectations:
E [XY ] = E [X] E [Y ].
To see this, it is easiest to start manipulating the right side. Suppose X can take values in S and Y can take values in T , and let W = {xy : x ∈ S, y ∈ T }. Then we have
x∈S
y∈T
Pr [X = x] Pr [Y = y] xy
x∈S
y∈T
Pr [X = x, Y = y] xy
a∈W
(x,y)∈S×T :xy=a
Pr [X = x, Y = y] a
a∈W
Pr [XY = a] a = E [XY ].
For a discrete real-valued random variable X, the variance is defined as
Var [X] = E
Intuitively, the variance captures how far the random variable is from its expectation in a squared, expected sense. Note that this can be alternatively expressed as
E
Linearity of variance under pairwise independence. An important property of the variance is that it is additive when the summands are pair- wise independent random variables. That is, if X 1 ,... , Xn are pairwise independent ran- dom variables, we have
Var
" (^) n X
i=
Xi
X^ n
i=
Var [Xi]
To see this, note that
Var
" (^) n X
i=
Xi
X^ n
i=
Xi
X^ n
i=
E [Xi]
X^ n
i=
X i^2
i<j
E [XiXj ] −
X^ n
i=
E [Xi]^2 − 2
i<j
E [Xi] E [Xj ]
X^ n
i=
X i^2
X^ n
i=
E [Xi]^2
X^ n
i=
Var [Xi]
where we used the fact that E [XY ] = E [X] E [Y ] for independent X, Y.
Pn i=
Xi. By linearity of expectation,
X^ n
i=
E[Xi] =
X^ n
i=
Pr[Xi = 1] =
X^ n
i=
n
and hence
t=
Pr [T = t] t =
t=
(1 − p)t−^1 pt
= p
t=
(1 − p)t−^1 t
= p
t=
(1 − p)t−^1 +
t=
(1 − p)t−^1 +...
= p
p
p
p
= 1 + (1 − p) + (1 − p)^2 +... =
p
So, we get a very neat result: the expected number of independent trials until a Bernoulli random variable with probability of being 1 equal to p is 1 is (^1) p.
Applying this to our case, the expected number of dollars will be 165. This calculation can be simplified using the following identity which holds when- ever T ranges over the natural numbers:
t=
Pr [T > t]