Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Probability Review for Computer Security, Lecture notes of Probability and Statistics

Massachusetts Institute of Technology (MIT)Probability and Statistics

A review of basic probability theory, including definitions, independence, probability distributions, random variables, and conditional probability. It also covers the law of total probability and variance, with examples. a handout for the course 6.1600 at Massachusetts Institute of Technology in Fall 2022, taught by Henry Corrigan-Gibbs, Yael Kalai, and Nickolai Zeldovich.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

ekachakra 🇺🇸

4.6

(33)

268 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

Foundations of Computer Security September 7, 2022

Massachusetts Institute of Technology 6.1600 Fall 2022

Henry Corrigan-Gibbs, Yael Kalai, Nickolai Zeldovich Handout 2

Probability Review

1 Basic theory

1.1 Basic definitions and independence

Let Ωbe the set of all possible outcomes of a (discrete) random experiment. We call Ωthe

sample space of the experiment. For example, suppose our random experiment consists of

flipping a fair coin ntimes independently. Then we can represent Ωas

Ω = {(a1, . . . , an) : ai∈ {0,1}}

where we encode heads as 1and tails as 0.

Aprobability distribution over Ωis a function p: Ω →R≥0such that P

x∈Ω

p(x)=1. An event

is any set A⊆Ω, and the probability of this event is Pr

p[A] = P

x∈A

p(x). We will often just

write Pr instead of Pr

pwhen the distribution pis clear from context. Two events A, B ⊆Ω

are called independent, if Pr[A∩B] = Pr[A] Pr[B].

In words, we can define the probability of an event in a uniform distribution as

Pr[event happens] = number of ways it can happen

total number of outcomes

In our example, the event that the first flip is heads is represented as the set

A1,1={(1, a2, . . . , an) : ai∈ {0,1}}

and similarly the event that the first flip is tails is

A1,0={(0, a2, . . . , an) : ai∈ {0,1}}

We can similarly define the events Ai,1for the i-th flip to be heads, and Ai,0for tails. Since

the coin flips are independent, and since the coin is fair, we have that

p((a1, . . . , an)) = Pr [A1,a1∩. . . ∩An,an]

= Pr [A1,a1]. . . Pr [An,an]

=1

2n.

Discover Lecture notes of Probability and Statistics Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Probability Review for Computer Security and more Lecture notes Probability and Statistics in PDF only on Docsity!

Foundations of Computer Security September 7, 2022 Massachusetts Institute of Technology 6.1600 Fall 2022 Henry Corrigan-Gibbs, Yael Kalai, Nickolai Zeldovich Handout 2

Probability Review

1 Basic theory

1.1 Basic definitions and independence

Let Ω be the set of all possible outcomes of a (discrete) random experiment. We call Ω the sample space of the experiment. For example, suppose our random experiment consists of flipping a fair coin n times independently. Then we can represent Ω as

Ω = {(a 1 ,... , an) : ai ∈ { 0 , 1 }}

where we encode heads as 1 and tails as 0. A probability distribution over Ω is a function p : Ω → R≥ 0 such that

P

x∈Ω

p(x) = 1. An event

is any set A ⊆ Ω, and the probability of this event is Pr p

[A] =

P

x∈A

p(x). We will often just

write Pr instead of Pr p when the distribution p is clear from context. Two events A, B ⊆ Ω

are called independent, if Pr[A ∩ B] = Pr[A] Pr[B]. In words, we can define the probability of an event in a uniform distribution as

Pr[event happens] =

number of ways it can happen total number of outcomes

In our example, the event that the first flip is heads is represented as the set

A 1 , 1 = {(1, a 2 ,... , an) : ai ∈ { 0 , 1 }}

and similarly the event that the first flip is tails is

A 1 , 0 = {(0, a 2 ,... , an) : ai ∈ { 0 , 1 }}

We can similarly define the events Ai, 1 for the i-th flip to be heads, and Ai, 0 for tails. Since the coin flips are independent, and since the coin is fair, we have that

p((a 1 ,... , an)) = Pr [A 1 ,a 1 ∩... ∩ An,an ]

= Pr [A 1 ,a 1 ]... Pr [An,an ]

=

2 n^

A (real-valued) random variable is a function X : Ω → R. In our example, the number of heads is a random variable represented by the function

X((a 1 ,... , an)) =

X^ n

i=

ai

Two discrete real-valued random variables X, Y are called independent if

Pr [X = x, Y = y] = Pr [X = x] Pr [Y = y]

for any x, y ∈ R. The random variables X 1 ,... , Xn are called (jointly) independent if

Pr [X 1 = x 1 ,... , Xn = xn] = Pr [X 1 = x 1 ]... Pr [Xn = xn]

for any x 1 ,... , xn. Note that the variables X 1 ,... , Xn can be pairwise independent with- out being jointly independent! In our example, letting Xi be the random variable that is 1 if the i-th coin landed heads and 0 otherwise (i.e., Xi((a 1 ,... , an)) = ai), the variables X 1 ,... , Xn are jointly independent.

1.2 Law of total probability

The law of total probability states that if we have events A 1 , A 2 ,... , An which partition the sample space (i.e., Ω is a disjoint union of these events), and B is any event, then

Pr [B] =

X^ n

i=

Pr [B ∩ Ai].

The law of total probability is also valid if we have a countably infinite partition into events A 1 , A 2 ,... , An,.. ., in which case

Pr [B] =

X^ ∞

i=

Pr [B ∩ Ai].

1.3 Conditional probability

Conditioning on something means assuming with certainty that this thing will happen. Formally, the probability of event A conditioned on event B is defined as

Pr[A|B] =

Pr[A ∩ B] Pr[B]

or, in words, the probability that both events happen, divided by the probability that B happens. The intuition is that we focus only on the part of our sample space Ω on which

1.5 Expectation

For a discrete real-valued random variable X taking possible values x 1 ,... , xn, the expec- tation is defined as

E [X] =

X^ n

i=

Pr [X = xi] xi

Linearity of Expectation Given random variables X 1 , ..., Xn and X =

Pn i=1 Xi, we have

E[X] = E

" (^) n X

i=

Xi

X^ n

i=

E [Xi]

In words, the expected value of the sum of random variables is equal to the sum of the expected values. A very important takeaway from this result is that it holds even if the random variables are not independent. This will be used frequently when we have to find the expected value of a sum of random variables when they might not be independent.

Multiplicativity of expectation under independence Another cool property of expectation is that the expectation of a product of independent variables is the product of individual expectations:

E [XY ] = E [X] E [Y ].

To see this, it is easiest to start manipulating the right side. Suppose X can take values in S and Y can take values in T , and let W = {xy : x ∈ S, y ∈ T }. Then we have

E [X] E [Y ] =

X

x∈S

X

y∈T

Pr [X = x] Pr [Y = y] xy

X

x∈S

X

y∈T

Pr [X = x, Y = y] xy

X

a∈W

X

(x,y)∈S×T :xy=a

Pr [X = x, Y = y] a

X

a∈W

Pr [XY = a] a = E [XY ].

1.6 Variance

For a discrete real-valued random variable X, the variance is defined as

Var [X] = E

(X − E [X])^2

Intuitively, the variance captures how far the random variable is from its expectation in a squared, expected sense. Note that this can be alternatively expressed as

E

(X − E [X])^2

= E

X^2 − 2 XE [X] + E [X]^2

= E

X^2

− 2 E [XE [X]] + E [X]^2

= E

X^2

− 2 E [X]^2 + E [X]^2

= E

X^2

− E [X]^2.

Linearity of variance under pairwise independence. An important property of the variance is that it is additive when the summands are pair- wise independent random variables. That is, if X 1 ,... , Xn are pairwise independent ran- dom variables, we have

Var

" (^) n X

i=

Xi

X^ n

i=

Var [Xi]

To see this, note that

Var

" (^) n X

i=

Xi

= E

X^ n

i=

Xi

X^ n

i=

E [Xi]

X^ n

i=

E

X i^2

X

i<j

E [XiXj ] −

X^ n

i=

E [Xi]^2 − 2

X

i<j

E [Xi] E [Xj ]

X^ n

i=

E

X i^2

X^ n

i=

E [Xi]^2

X^ n

i=

Var [Xi]

where we used the fact that E [XY ] = E [X] E [Y ] for independent X, Y.

1.7 Examples

Suppose we pick a uniformly random permutation of n elements. What is the ex- pected number of fixed points in it? Solution. Let Xi = 1 if the i-th element is a fixed point and Xi = 0 otherwise. The total number of fixed points is X =

Pn i=

Xi. By linearity of expectation,

E[X] =

X^ n

i=

E[Xi] =

X^ n

i=

Pr[Xi = 1] =

X^ n

i=

n

and hence

E [T ] =

X^ ∞

t=

Pr [T = t] t =

X^ ∞

t=

(1 − p)t−^1 pt

= p

X^ ∞

t=

(1 − p)t−^1 t

= p

X^ ∞

t=

(1 − p)t−^1 +

X^ ∞

t=

(1 − p)t−^1 +...

= p

p

(1 − p)

p

(1 − p)^2

p

= 1 + (1 − p) + (1 − p)^2 +... =

p

So, we get a very neat result: the expected number of independent trials until a Bernoulli random variable with probability of being 1 equal to p is 1 is (^1) p.

Applying this to our case, the expected number of dollars will be 165. This calculation can be simplified using the following identity which holds when- ever T ranges over the natural numbers:

E [T ] =

X^ ∞

t=

Pr [T > t]

Barr flips a fair coin n times, and so does Derrick. Show that the probability that they get the same number of heads is

2 n n

/ 4 n. Use your argument to verify the identity

X^ n

k=

n k

2 n n

Solution. Let our probability space be Ω = {(a 1 ,... , an, b 1 ,... , bn) : ai ∈ { 0 , 1 }, bi ∈ { 0 , 1 }}, where ai = 1 if the i-th flip of Barr was heads and 0 otherwise, and bi = 1 if the i-th flip of Derrick was tails, and 0 otherwise. Note that we encode heads and tails in opposite ways for Barr and Derrick. Then note that the event that they flipped the same number of heads is

A =

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai =

X^ n

i=

(1 − bi)

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai +

X^ n

i=

bi = n

which immediately tells us that Pr [A] = (^2 nn ) 22 n^ as wanted. Now, note that we could have computed the same probability with a different prob- ability space: namely, the one where we encode heads and tails in the same way. Here Ω = {(a 1 ,... , an, b 1 ,... , bn) : ai ∈ { 0 , 1 }, bi ∈ { 0 , 1 }}, where ai = 1 if the i-th flip of Barr was heads and 0 otherwise, and bi = 1 if the i-th flip of Derrick was heads, and 0 otherwise. Now we have

A =

(a 1 ,... , an, b 1 ,... , bn) :

X^ n

i=

ai =

X^ n

i=

bi

We can calculate the probability by considering all the different possible numbers of heads that the two players can have (we’re using the law of total probability here):

Pr [A] =

X^ n

k=

Pr

A ∩

X^ n

i=

ai = k

X^ n

k=

Pr

" (^) n X

i=

ai =

X^ n

i=

bi = k

X^ n

k=

Pr

" (^) n X

i=

ai = k

Pr

" (^) n X

i=

bi = k

X^ n

k=

n k

2 n

n k

2 n

Pn k=

n k

4 n^

Comparing the two expressions, we get the desired identity.

Proof. Since (X − μ)^2 is a nonnegative random variable, by Markov’s inequality we get

Pr[(X − μ)^2 ≥ k^2 ] ≤

E[(X − μ)^2 ] k^2

Pr[X − μ ≥ k] ≤

σ^2 k^2

2.3 Chernoff Bounds.

Suppose X 1 ,... , Xn are independent random variables taking values in { 0 , 1 }. Let X denote their sum and let μ = E[X] denote the sum’s expected value. Then for any β > 0 ,

Pr[X > (1 + β)μ] < e−β (^2) μ/ 3 , for 0 < β < 1
Pr[X > (1 + β)μ] < e−βμ/^3 , for β > 1
Pr[X < (1 − β)μ] < e−β (^2) μ/ 2 , for 0 < β < 1

This allows us to get an even tighter bound because we can use the fact that the random variables exhibit full mutual independence. Note that this is a stronger assumption than pairwise independence! There are groups of random variables which are all pairwise independent but which are not mutually independent.

2.4 Examples

Let’s say that we flip a biased coin that lands heads with probability 13 a total of n times. Use Chernoff bounds to determine a value of n such that the probability of getting more than half of the flips heads is less than 10001. Solution. Let Xi be a random variable that is 1 if the i-th flip landed heads and 0 otherwise. If we denote X =

Pn i=

Xi, we want to find the smallest n such that

Pr[X > n 2 ] < 10001.

Note that μ = E[X] =

Pn i=

E[Xi] =

Pn i=

1 3 =^

n

Applying Chernoff bounds from the previous section with β = 12 we get

Pr[X >

μ] < e−(1/2) (^2) μ/ 3

⇔ Pr[X >

n 2

] < e−n/^36

So for e−n/^36 < 1 / 1000 ⇔ n > 36 log 1000 ≈ 250 we have the required bound.

Bar the bear decides he wants to manage beehives in his old age. He’s just received k bees that he wants to allocate to his n beehives. Since Bar is old, he often loses count when trying to allocate the bees to beehives. He decides to just allocate the bees randomly to his hives. That is, for each bee, he chooses a beehive uniformly at ran- dom. Help Bar prove that his strategy yields an approximately uniform distribution of bees with high probability.

(a) Let Xi be the number of bees in the i-th beehive. Compute E[Xi]. Solution. Let Yji be 1 if the j-th bee is allocated to the i-th beehive, and 0 otherwise. We have E[Yji] = Pr[j-th bee is put into i-th beehive] = 1/n. Then Xi =

Pk j=1 Yji, so^ E[Xi] =^

Pk j=1 E[Yji] =^

Pk j=1 1 /n^ =^ k/n. (b) Show that Xi and Xj are not independent. Solution. We see that Pr[Xi = k ∩ Xj = k] = 0. However, Pr[Xi = k] Pr[Xj = k] = (1/n)^2 k. Thus, Xi and Xj are not independent. (c) Let M = max(X 1 , X 2 ,... , Xn). Show Pr[M ≥ 2 k/n] ≤ ne−k/(3n). Solution. The idea is to use Chernoff bounds to show that Pr[Xi ≥ 2 k/n] is small and then use the union bound to bound the probability that any of the Xi variables is greater than 2 k/n. Recall that Xi =

Pk j=1 Yji. We have Pr[Xi ≥ (1 + δ)E[Xi]] ≤ e−δ (^2) E[Xi]/ 3 by Chernoff. Thus, we getP Pr[Xi ≥ 2 k/n] ≤ e−k/(3n), and by union bound Pr[M ≥ 2 k/n] ≤ n i=1 Pr[Xi^ ≥^2 k/n]^ ≤^

Pn i=1 e

−k/(3n) (^) = ne−k/(3n).

which shows that the random variables are indeed independent. This means that

Var [Tn] = Var [T 1 + (T 2 − T 1 ) +... + (Tn − Tn− 1 )] = Var [T 1 ] + Var [T 2 − T 1 ] +... + Var [Tn − Tn− 1 ]

Now we’re faced with the general task of computing the variance of the random variable T which is the first time that a Bernoulli random variable X with Pr [X = 1] = p becomes

We have

Pr [T = t] = (1 − p)t−^1 p

and as we saw earlier, E [T ] = (^1) p. It remains to compute

E

T 2

X^ ∞

t=

Pr [T = t] t^2

X^ ∞

t=

(1 − p)t−^1 pt^2

= p

X^ ∞

t=

(1 − p)t−^1 t^2

We could compute this sum by decomposing it into simpler sums in a clever way. But here’s a useful (and more principled) trick for computing sums like this: consider the function f (x) = (^1) −^1 x for |x| < 1. Then we have the power series expansion

1 1 − x

= 1 + x + x^2 +... =

X^ ∞

n=

xn

Differentiating both sides, we have

1 (1 − x)^2

= 1 + 2x + 3x^2 +... =

X^ ∞

t=

(t + 1)xt

and differentiating again,

2 (1 − x)^3

= 2 + 6x + 12x^2 +... =

X^ ∞

t=

(t + 1)(t + 2)xt

Using this, we have

X^ ∞

t=

(1 − p)t−^1 t^2 =

X^ ∞

t=

(1 − p)t−^1 t(t + 1) −

X^ ∞

t=

(1 − p)t−^1 t

X^ ∞

t=

(1 − p)t(t + 1)(t + 2) −

X^ ∞

t=

(1 − p)t(t + 1)

p^3

p^2

and so

Var [T ] = E

T 2

− E [T ]^2

p^2

p

p^2

1 − p p^2

which implies that

Var [Tn] =

X^ n

k=

1 − n−nk n−k n

X^ n

k=

nk (n − k)^2

≤ n^2

X^ ∞

l=

l^2

≤ 2 n^2.

Thus, by Chebyshev,

Pr [|Tn − E [Tn]| ≥ cn] ≤

c^2

Probability Review for Computer Security, Lecture notes of Probability and Statistics

Related documents

Partial preview of the text

Download Probability Review for Computer Security and more Lecture notes Probability and Statistics in PDF only on Docsity!

Probability Review

1 Basic theory

1.1 Basic definitions and independence

P

[A] =

P

1.2 Law of total probability

X^ ∞

1.3 Conditional probability

1.5 Expectation

E [X] =

E[X] = E

E [X] E [Y ] =

X

X

X

X

X

X

X

1.6 Variance

(X − E [X])^2

(X − E [X])^2

= E

X^2 − 2 XE [X] + E [X]^2

= E

X^2

− 2 E [XE [X]] + E [X]^2

= E

X^2

− 2 E [X]^2 + E [X]^2

= E

X^2

− E [X]^2.

= E

E

X

X

E

1.7 Examples

E[X] =

E [T ] =

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

E [T ] =

X^ ∞

A =

A =

A ∩

2.3 Chernoff Bounds.

2.4 Examples

E

T 2

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

X^ ∞

T 2

− E [T ]^2

X^ ∞