Probability Review for Computer Security, Lecture notes of Probability and Statistics

A review of basic probability theory, including definitions, independence, probability distributions, random variables, and conditional probability. It also covers the law of total probability and variance, with examples. a handout for the course 6.1600 at Massachusetts Institute of Technology in Fall 2022, taught by Henry Corrigan-Gibbs, Yael Kalai, and Nickolai Zeldovich.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

ekachakra
ekachakra 🇺🇸

4.6

(33)

268 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Foundations of Computer Security September 7, 2022
Massachusetts Institute of Technology 6.1600 Fall 2022
Henry Corrigan-Gibbs, Yael Kalai, Nickolai Zeldovich Handout 2
Probability Review
1 Basic theory
1.1 Basic definitions and independence
Let be the set of all possible outcomes of a (discrete) random experiment. We call the
sample space of the experiment. For example, suppose our random experiment consists of
flipping a fair coin ntimes independently. Then we can represent as
= {(a1, . . . , an) : ai {0,1}}
where we encode heads as 1and tails as 0.
Aprobability distribution over is a function p: R0such that P
x
p(x)=1. An event
is any set A, and the probability of this event is Pr
p[A] = P
xA
p(x). We will often just
write Pr instead of Pr
pwhen the distribution pis clear from context. Two events A, B
are called independent, if Pr[AB] = Pr[A] Pr[B].
In words, we can define the probability of an event in a uniform distribution as
Pr[event happens] = number of ways it can happen
total number of outcomes
In our example, the event that the first flip is heads is represented as the set
A1,1={(1, a2, . . . , an) : ai {0,1}}
and similarly the event that the first flip is tails is
A1,0={(0, a2, . . . , an) : ai {0,1}}
We can similarly define the events Ai,1for the i-th flip to be heads, and Ai,0for tails. Since
the coin flips are independent, and since the coin is fair, we have that
p((a1, . . . , an)) = Pr [A1,a1. . . An,an]
= Pr [A1,a1]. . . Pr [An,an]
=1
2n.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Probability Review for Computer Security and more Lecture notes Probability and Statistics in PDF only on Docsity!

Foundations of Computer Security September 7, 2022 Massachusetts Institute of Technology 6.1600 Fall 2022 Henry Corrigan-Gibbs, Yael Kalai, Nickolai Zeldovich Handout 2

Probability Review

1 Basic theory

1.1 Basic definitions and independence

Let Ω be the set of all possible outcomes of a (discrete) random experiment. We call Ω the sample space of the experiment. For example, suppose our random experiment consists of flipping a fair coin n times independently. Then we can represent Ω as

Ω = {(a 1 ,... , an) : ai ∈ { 0 , 1 }}

where we encode heads as 1 and tails as 0. A probability distribution over Ω is a function p : Ω → R≥ 0 such that

P

x∈Ω

p(x) = 1. An event

is any set A ⊆ Ω, and the probability of this event is Pr p

[A] =

P

x∈A

p(x). We will often just

write Pr instead of Pr p when the distribution p is clear from context. Two events A, B ⊆ Ω

are called independent, if Pr[A ∩ B] = Pr[A] Pr[B]. In words, we can define the probability of an event in a uniform distribution as

Pr[event happens] =

number of ways it can happen total number of outcomes

In our example, the event that the first flip is heads is represented as the set

A 1 , 1 = {(1, a 2 ,... , an) : ai ∈ { 0 , 1 }}

and similarly the event that the first flip is tails is

A 1 , 0 = {(0, a 2 ,... , an) : ai ∈ { 0 , 1 }}

We can similarly define the events Ai, 1 for the i-th flip to be heads, and Ai, 0 for tails. Since the coin flips are independent, and since the coin is fair, we have that

p((a 1 ,... , an)) = Pr [A 1 ,a 1 ∩... ∩ An,an ]

= Pr [A 1 ,a 1 ]... Pr [An,an ]

=

2 n^

A (real-valued) random variable is a function X : Ω → R. In our example, the number of heads is a random variable represented by the function

X((a 1 ,... , an)) =

X^ n

i=

ai

Two discrete real-valued random variables X, Y are called independent if

Pr [X = x, Y = y] = Pr [X = x] Pr [Y = y]

for any x, y ∈ R. The random variables X 1 ,... , Xn are called (jointly) independent if

Pr [X 1 = x 1 ,... , Xn = xn] = Pr [X 1 = x 1 ]... Pr [Xn = xn]

for any x 1 ,... , xn. Note that the variables X 1 ,... , Xn can be pairwise independent with- out being jointly independent! In our example, letting Xi be the random variable that is 1 if the i-th coin landed heads and 0 otherwise (i.e., Xi((a 1 ,... , an)) = ai), the variables X 1 ,... , Xn are jointly independent.

1.2 Law of total probability

The law of total probability states that if we have events A 1 , A 2 ,... , An which partition the sample space (i.e., Ω is a disjoint union of these events), and B is any event, then

Pr [B] =

X^ n

i=

Pr [B ∩ Ai].

The law of total probability is also valid if we have a countably infinite partition into events A 1 , A 2 ,... , An,.. ., in which case

Pr [B] =

X^ ∞

i=

Pr [B ∩ Ai].

1.3 Conditional probability

Conditioning on something means assuming with certainty that this thing will happen. Formally, the probability of event A conditioned on event B is defined as

Pr[A|B] =

Pr[A ∩ B] Pr[B]

or, in words, the probability that both events happen, divided by the probability that B happens. The intuition is that we focus only on the part of our sample space Ω on which

1.5 Expectation

For a discrete real-valued random variable X taking possible values x 1 ,... , xn, the expec- tation is defined as

E [X] =

X^ n

i=

Pr [X = xi] xi

Linearity of Expectation Given random variables X 1 , ..., Xn and X =

Pn i=1 Xi, we have

E[X] = E

" (^) n X

i=

Xi

X^ n

i=

E [Xi]

In words, the expected value of the sum of random variables is equal to the sum of the expected values. A very important takeaway from this result is that it holds even if the random variables are not independent. This will be used frequently when we have to find the expected value of a sum of random variables when they might not be independent.

Multiplicativity of expectation under independence Another cool property of expectation is that the expectation of a product of independent variables is the product of individual expectations:

E [XY ] = E [X] E [Y ].

To see this, it is easiest to start manipulating the right side. Suppose X can take values in S and Y can take values in T , and let W = {xy : x ∈ S, y ∈ T }. Then we have

E [X] E [Y ] =

X

x∈S

X

y∈T

Pr [X = x] Pr [Y = y] xy

X

x∈S

X

y∈T

Pr [X = x, Y = y] xy

X

a∈W

X

(x,y)∈S×T :xy=a

Pr [X = x, Y = y] a

X

a∈W

Pr [XY = a] a = E [XY ].

1.6 Variance

For a discrete real-valued random variable X, the variance is defined as

Var [X] = E

(X − E [X])^2

Intuitively, the variance captures how far the random variable is from its expectation in a squared, expected sense. Note that this can be alternatively expressed as

E

(X − E [X])^2

= E

X^2 − 2 XE [X] + E [X]^2

= E

X^2

− 2 E [XE [X]] + E [X]^2

= E

X^2

− 2 E [X]^2 + E [X]^2

= E

X^2

− E [X]^2.

Linearity of variance under pairwise independence. An important property of the variance is that it is additive when the summands are pair- wise independent random variables. That is, if X 1 ,... , Xn are pairwise independent ran- dom variables, we have

Var

" (^) n X

i=

Xi

X^ n

i=

Var [Xi]

To see this, note that

Var

" (^) n X

i=

Xi

= E

X^ n

i=

Xi

X^ n

i=

E [Xi]

X^ n

i=

E

X i^2

X

i<j

E [XiXj ] −

X^ n

i=

E [Xi]^2 − 2

X

i<j

E [Xi] E [Xj ]

X^ n

i=

E

X i^2

X^ n

i=

E [Xi]^2

X^ n

i=

Var [Xi]

where we used the fact that E [XY ] = E [X] E [Y ] for independent X, Y.

1.7 Examples

  1. Suppose we pick a uniformly random permutation of n elements. What is the ex- pected number of fixed points in it? Solution. Let Xi = 1 if the i-th element is a fixed point and Xi = 0 otherwise. The total number of fixed points is X =

Pn i=

Xi. By linearity of expectation,

E[X] =

X^ n

i=

E[Xi] =

X^ n

i=

Pr[Xi = 1] =

X^ n

i=

n

and hence

E [T ] =

X^ ∞

t=

Pr [T = t] t =

X^ ∞

t=

(1 − p)t−^1 pt

= p

X^ ∞

t=

(1 − p)t−^1 t

= p

X^ ∞

t=

(1 − p)t−^1 +

X^ ∞

t=

(1 − p)t−^1 +...

= p

p

  • (1 − p)

p

  • (1 − p)^2

p

= 1 + (1 − p) + (1 − p)^2 +... =

p

So, we get a very neat result: the expected number of independent trials until a Bernoulli random variable with probability of being 1 equal to p is 1 is (^1) p.

Applying this to our case, the expected number of dollars will be 165. This calculation can be simplified using the following identity which holds when- ever T ranges over the natural numbers:

E [T ] =

X^ ∞

t=

Pr [T > t]

  1. Barr flips a fair coin n times, and so does Derrick. Show that the probability that they get the same number of heads is

2 n n

/ 4 n. Use your argument to verify the identity

X^ n

k=

n k

2 n n

Solution. Let our probability space be Ω = {(a 1 ,... , an, b 1 ,... , bn) : ai ∈ { 0 , 1 }, bi ∈ { 0 , 1 }}, where ai = 1 if the i-th flip of Barr was heads and 0 otherwise, and bi = 1 if the i-th flip of Derrick was tails, and 0 otherwise. Note that we encode heads and tails in opposite ways for Barr and Derrick. Then note that the event that they flipped the same number of heads is

A =

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai =

X^ n

i=

(1 − bi)

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai +

X^ n

i=

bi = n

which immediately tells us that Pr [A] = (^2 nn ) 22 n^ as wanted. Now, note that we could have computed the same probability with a different prob- ability space: namely, the one where we encode heads and tails in the same way. Here Ω = {(a 1 ,... , an, b 1 ,... , bn) : ai ∈ { 0 , 1 }, bi ∈ { 0 , 1 }}, where ai = 1 if the i-th flip of Barr was heads and 0 otherwise, and bi = 1 if the i-th flip of Derrick was heads, and 0 otherwise. Now we have

A =

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai =

X^ n

i=

bi

We can calculate the probability by considering all the different possible numbers of heads that the two players can have (we’re using the law of total probability here):

Pr [A] =

X^ n

k=

Pr

A ∩

X^ n

i=

ai = k

X^ n

k=

Pr

" (^) n X

i=

ai =

X^ n

i=

bi = k

X^ n

k=

Pr

" (^) n X

i=

ai = k

Pr

" (^) n X

i=

bi = k

X^ n

k=

n k

2 n

n k

2 n

Pn k=

n k

4 n^

Comparing the two expressions, we get the desired identity.

Proof. Since (X − μ)^2 is a nonnegative random variable, by Markov’s inequality we get

Pr[(X − μ)^2 ≥ k^2 ] ≤

E[(X − μ)^2 ] k^2

Pr[X − μ ≥ k] ≤

σ^2 k^2

2.3 Chernoff Bounds.

Suppose X 1 ,... , Xn are independent random variables taking values in { 0 , 1 }. Let X denote their sum and let μ = E[X] denote the sum’s expected value. Then for any β > 0 ,

  • Pr[X > (1 + β)μ] < e−β (^2) μ/ 3 , for 0 < β < 1
  • Pr[X > (1 + β)μ] < e−βμ/^3 , for β > 1
  • Pr[X < (1 − β)μ] < e−β (^2) μ/ 2 , for 0 < β < 1

This allows us to get an even tighter bound because we can use the fact that the random variables exhibit full mutual independence. Note that this is a stronger assumption than pairwise independence! There are groups of random variables which are all pairwise independent but which are not mutually independent.

2.4 Examples

  1. Let’s say that we flip a biased coin that lands heads with probability 13 a total of n times. Use Chernoff bounds to determine a value of n such that the probability of getting more than half of the flips heads is less than 10001. Solution. Let Xi be a random variable that is 1 if the i-th flip landed heads and 0 otherwise. If we denote X =

Pn i=

Xi, we want to find the smallest n such that

Pr[X > n 2 ] < 10001.

Note that μ = E[X] =

Pn i=

E[Xi] =

Pn i=

1 3 =^

n

  1. Applying Chernoff bounds from the previous section with β = 12 we get

Pr[X >

μ] < e−(1/2) (^2) μ/ 3

⇔ Pr[X >

n 2

] < e−n/^36

So for e−n/^36 < 1 / 1000 ⇔ n > 36 log 1000 ≈ 250 we have the required bound.

  1. Bar the bear decides he wants to manage beehives in his old age. He’s just received k bees that he wants to allocate to his n beehives. Since Bar is old, he often loses count when trying to allocate the bees to beehives. He decides to just allocate the bees randomly to his hives. That is, for each bee, he chooses a beehive uniformly at ran- dom. Help Bar prove that his strategy yields an approximately uniform distribution of bees with high probability.

(a) Let Xi be the number of bees in the i-th beehive. Compute E[Xi]. Solution. Let Yji be 1 if the j-th bee is allocated to the i-th beehive, and 0 otherwise. We have E[Yji] = Pr[j-th bee is put into i-th beehive] = 1/n. Then Xi =

Pk j=1 Yji, so^ E[Xi] =^

Pk j=1 E[Yji] =^

Pk j=1 1 /n^ =^ k/n. (b) Show that Xi and Xj are not independent. Solution. We see that Pr[Xi = k ∩ Xj = k] = 0. However, Pr[Xi = k] Pr[Xj = k] = (1/n)^2 k. Thus, Xi and Xj are not independent. (c) Let M = max(X 1 , X 2 ,... , Xn). Show Pr[M ≥ 2 k/n] ≤ ne−k/(3n). Solution. The idea is to use Chernoff bounds to show that Pr[Xi ≥ 2 k/n] is small and then use the union bound to bound the probability that any of the Xi variables is greater than 2 k/n. Recall that Xi =

Pk j=1 Yji. We have Pr[Xi ≥ (1 + δ)E[Xi]] ≤ e−δ (^2) E[Xi]/ 3 by Chernoff. Thus, we getP Pr[Xi ≥ 2 k/n] ≤ e−k/(3n), and by union bound Pr[M ≥ 2 k/n] ≤ n i=1 Pr[Xi^ ≥^2 k/n]^ ≤^

Pn i=1 e

−k/(3n) (^) = ne−k/(3n).

which shows that the random variables are indeed independent. This means that

Var [Tn] = Var [T 1 + (T 2 − T 1 ) +... + (Tn − Tn− 1 )] = Var [T 1 ] + Var [T 2 − T 1 ] +... + Var [Tn − Tn− 1 ]

Now we’re faced with the general task of computing the variance of the random variable T which is the first time that a Bernoulli random variable X with Pr [X = 1] = p becomes

  1. We have

Pr [T = t] = (1 − p)t−^1 p

and as we saw earlier, E [T ] = (^1) p. It remains to compute

E

T 2

X^ ∞

t=

Pr [T = t] t^2

X^ ∞

t=

(1 − p)t−^1 pt^2

= p

X^ ∞

t=

(1 − p)t−^1 t^2

We could compute this sum by decomposing it into simpler sums in a clever way. But here’s a useful (and more principled) trick for computing sums like this: consider the function f (x) = (^1) −^1 x for |x| < 1. Then we have the power series expansion

1 1 − x

= 1 + x + x^2 +... =

X^ ∞

n=

xn

Differentiating both sides, we have

1 (1 − x)^2

= 1 + 2x + 3x^2 +... =

X^ ∞

t=

(t + 1)xt

and differentiating again,

2 (1 − x)^3

= 2 + 6x + 12x^2 +... =

X^ ∞

t=

(t + 1)(t + 2)xt

Using this, we have

X^ ∞

t=

(1 − p)t−^1 t^2 =

X^ ∞

t=

(1 − p)t−^1 t(t + 1) −

X^ ∞

t=

(1 − p)t−^1 t

X^ ∞

t=

(1 − p)t(t + 1)(t + 2) −

X^ ∞

t=

(1 − p)t(t + 1)

p^3

p^2

and so

Var [T ] = E

T 2

− E [T ]^2

p^2

p

p^2

1 − p p^2

which implies that

Var [Tn] =

X^ n

k=

1 − n−nk n−k n

X^ n

k=

nk (n − k)^2

≤ n^2

X^ ∞

l=

l^2

≤ 2 n^2.

Thus, by Chebyshev,

Pr [|Tn − E [Tn]| ≥ cn] ≤

c^2