Randomized Algorithms Week 2: Tail Inequalities, Study notes of Algorithms and Programming

In this section, we study three ways to estimate the tail probabilities of random variables. Please note that the more information we know about the random ...

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

alannis
alannis 🇺🇸

4.7

(13)

263 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
600.664: Randomized Algorithms
Johns Hopkins University
Professor: Rao Kosaraju
Scribe: Your Name
Randomized Algorithms
Week 2: Tail Inequalities
Rao Kosaraju
In this section, we study three ways to estimate the tail probabilities of random variables.
Please note that the more information we know about the random variable, the better the
estimate we can derive about a given tail probability.
2.1 Markov Inequality
Theorem 1. Markov Inequality
If Xis a non-negative valued random variable with an expectation of µ, then for any c > 0,
P[X]1
c.
Proof. By definition,
µ=PaaP [X=a]
=Pa<cµ aP [X=a] + Pa aP [X=a]
0 + Pa cµP [X=a] as Xis non-negative valued
= Pa P[X=a]
=cµP [X]
Hence, P[X]µ
=1
c.
An equivalent form for the theorem is P(Xc)µ
c.
The knowledge of the standard deviation of the random variable Xwould most often give
a better bound.
2.2 Chebychev Inequality
Theorem 2. Chebychev Inequality
Let Xbe a random variable with expectation µXand standard deviation σX. Then, P[|X
µX| X]1
c2.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Randomized Algorithms Week 2: Tail Inequalities and more Study notes Algorithms and Programming in PDF only on Docsity!

Johns Hopkins University Scribe: Your Name

Randomized Algorithms

Week 2: Tail Inequalities

Rao Kosaraju

In this section, we study three ways to estimate the tail probabilities of random variables. Please note that the more information we know about the random variable, the better the estimate we can derive about a given tail probability.

2.1 Markov Inequality

Theorem 1. Markov Inequality

If X is a non-negative valued random variable with an expectation of μ, then for any c > 0 , P [X ≥ cμ] ≤ (^1) c.

Proof. By definition,

μ =

a aP^ [X^ =^ a]

a Johns Hopkins University Scribe: Your Name

Proof. Let random variable Y = (X −μX )^2. Then, E[Y ] = E[(X −μX )^2 ] = σ^2 X by definition of σX. Note that Y is a non-negative valued random variable.

Now, P [|X − μX | ≥ cσX ] = P [(X − μX )^2 ≥ c^2 σ^2 X ] = P [Y ≥ c^2 σ X^2 ].

Applying Markov Inequality to the random variable Y , P r[Y ≥ c^2 σ^2 X ] = P [Y ≥ c^2 μY ] ≤ 1 c^2

Note that the random variable X need not be non-negative valued for the Chebychev in- equality to hold.

2.3 Chernoff Bounds

The tail estimates given by Theorem 1 and Theorem 2 work for random variables in general. However, if the random variable X can be expressed as a sum of n independent random variables each of which is 0, 1 −valued, then we can obtain very tight bounds on the tail estimates. This is expressed in the following theorem and the bounds are commonly called Chernoff Bounds.

Theorem 3. Chernoff Bound for upper tail

Let X be a random variable defined as X = X 1 + X 2 + · · · + Xn where each Xi, 1 ≤ i ≤ n, is a 0 , 1 − valued random variable and all Xi’s are independent. Also, let E[X] = μ and

P [Xi = 1] = pi, 1 ≤ i ≤ n. Then for any δ > 0 , P [X ≥ μ(1 + δ)] ≤

eδ (1+δ)1+δ

)μ .

Proof. While proving Chebychev inequality (Theorem 2), we made use of a second-order moment. Application of higher order moments would generally improve the bound on the tail inequality. We establish the tail estimates of the sums of independent random variables by utilizing the exponential function, which essentially captures a weighted sum of all the moments.

Let Y = etX^ , for an appropriate positive value of t to be chosen later on. If we let Yi = etXi^ , Yi’s are also independent as Xi’s are. Note that Y = Y 1 Y 2 · · · Yn. Note also that

E[Yi] = E[etXi^ ] = piet^ + (1 − pi)e^0 = 1 − pi + piet^ (1)

E[Y ] = E[Y 1 Y 2 · · · Yn] = Πni=1E[Yi] = Πni=1(1 − pi + piet) (2)

in which the second equality follows from the independence of Yi’s.

Observe that,

μ =

∑^ n

i=

pi (3)

Johns Hopkins University Scribe: Your Name

That is, it suffices if we prove

f (δ) = δ − (1 + δ) ln(1 + δ) + δ

2 4 ≤^ 0.

Differentiating f twice, we have

f ′(δ) = − ln(1 + δ) + δ 2

f ′′(δ) = (^) 2(1+δ−^1 δ)

Note that f ′′(δ) ≤ 0 for 0 < δ ≤ 1. Hence f ′(δ) is monotonically non-increasing as δ varies from 0 to 1.

Since f ′(1) < 0, f ′(δ) < 0 for any 0 < δ ≤ 1. Hence f (δ) is monotonically decreasing as δ varies from 0 to 1. Since f (0) = 0, f (δ) ≤ 0 for any 0 < δ ≤ 1.

For the interval δ > 1 , we now establish ( e

δ (1+δ)(1+δ) )

μ (^) ≤ e −μδ 2 ln^ δ. Simplifying as above, it

suffices to prove g(δ) = δ −(1+δ) ln(1+δ)+ δ^ ln 2 δ≤ 0 when δ > 1. Once again differentiating twice,

g′(δ) = 12 − ln(1 + δ) + lnδ 2 , and

g′′(δ) = − (^) 1+^1 δ + (^) 1+^1 δ + (^21) δ = (^2) δ^1 (1+−δδ).

Note that g′′(δ) < 0 when δ > 1. Hence,

g′(δ) is monotonically decreasing, g′(1) is negative. Hence,

g′(δ) < 0 when δ ≥ 1. Consequently,

g(δ) is monotonically decreasing as δ increases. Since g(1) < 0 , g(δ) < 0 for any δ > 1.

2.4 Application of Tail Inequalities

We now apply tail inequalities for two problems.

Johns Hopkins University Scribe: Your Name

2.4.1 n Balls and n Bins

Consider throwing n balls, independently and uniformly at random, into n bins. We are interested in the probability that bin 1 contains more than 7 balls. Define n 0 , 1 − valued random variables Xi, 1 ≤ i ≤ n, defined as Xi = 1 if ball i falls into bin 1 and 0 other- wise. By uniformity, P [Xi = 1] = (^) n^1. Define the random variable X = X 1 + X 2 + · · · Xn. Thus X denotes the number of balls that fall in bin 1. By the linearity of expectation, E[X] = E[

∑n i=1 Xi] =^

∑n i=1 E[Xi] =^ n^

1 n = 1.

Using the Markov inequality from Theorem 1, we get

P [X ≥ 7] ≤

For Chebychev inequality (Theorem 2), we first compute the standard deviation of X

Var(Xi) = E[X i^2 ] − E[Xi]^2 =

n

n^2

, and (8)

Var(X) =

∑^ n

i=

V ar(Xi) = n(

n

n^2

n

where the first equality in (9) follows from the independence of X i′ s. Hence,

σX =

n

Applying Chebychev inequality to the random variable X,

P [X ≥ 7] = P [X − 1 ≥ 6] ≤ P [|X − 1 | ≥ 6] ≤

√^6

1 − (^1) n

1 − (^) n^1 36

Using Chernoff bound from Theorem 4,

P [X ≥ 7] = P [X ≥ (1 + 6)1] ≤ e−^

6 ln 2 6

Now comparing equations (7), (11) and (12) we can see that using the Chebychev inequal- ity gives a better bound than that of the Markov inequality, and a much better bound is obtained using Chernoff bounds. As another example, let us consider the probabil- ity that bin 1 has more than 1 + 10 ln n balls. Using the Markov inequality, we get P [X ≥ 1 + 10 ln n] ≤ (^) 1+10 ln^1 n. Using the Chebychev inequality, we get that P [X ≥

1 + 10 ln n] ≤ P [|X − 1 | ≥ 10 ln n] ≤ 1 − (^) n^1 100 ln^2 n ≤^

1 100 ln^2 n whereas, using Chernoff bounds, we

Johns Hopkins University Scribe: Your Name

2.6 Special Case of Tail Inequalities

2.6.1 Independent and identical {− 1 , +1} valued random variables

Theorem 7. Let Xi, 1 ≤ i ≤ n, be n independent and identically distributed {− 1 , +1} valued random variables such that P [Xi = +1] = P [Xi = −1] = 1/ 2. Let the random variable X be defined by X =

∑n i=1 Xi. Then,^ P^ [X^ ≥^ δ] =^ P^ [X^ ≤ −δ]^ ≤^ e

−δ^2 / 2 n (^) for any

δ > 0.

Proof. By Symmetry we have P [X ≥ δ] = P [X ≤ −δ]. Here we prove a weaker form of the theorem: P [X ≥ δ] ≤ e−δ (^2) / 6 n , making use of Theorem 4. By making use of the exponential function etX^ , a direct derivation results in the claimed bound. Observe that E[Xi] = 0 and E[X] = 0. We define random variables Yi = 1+ 2 X i, 1 ≤ i ≤ n, and Y =

∑n i=1 Yi.^ Note that each Yi is 0,1-valued, and E[Yi] = 1/2. Hence E[Y ] = n/2. Thus, P [X ≥ δ] = P [Y ≥ n 2 +^

δ 2 ] =^ P^ [Y^ ≥^

n 2 (1 +^

δ n )]^ ≤^ e

− (^) nδ^22 n (^213) = e− δ 62 n (^).

When δ > n, there is no need to apply this result since we know that X can never take a value a greater than n. An alternative form of the above Theorem is P [X ≥ δn] ≤ e−δ

(^2) n/ 2 .

3 Set Balancing Problem

In this section, we apply Chernoff bounds to another problem known as the Set Balancing Problem, which is defined as follows. Given an n×n { 0 , 1 } matrix A, find a {− 1 , +1} valued column vector X such that the product AX has the smallest maximum absolute entry, i.e. minimize ||AX||∞.

Example 1. Let the matrix A be as given below.

A =

For X =

[

]T

, (AX)T^ =

[

]

Thus the maximum absolute entry is 2.

Our goal is to make every entry of AX as close to 0 as possible. In a way, we are measuring the discrepancy using the maximum absolute value of the entries of AX. In general, it is not possible to make every entry of AX to be 0, for example when A has a row with an odd number of 1′s. A brute force solution for choosing X involves trying all possible column vectors of size n, which would take Ω(2n) time. Instead, we develop a very simple randomized algorithm that guarantees an expected discrepancy of O(

n ln n). In a subse- quent chapter, we derandomize the algorithm and obtain a deterministic polynomial time

Johns Hopkins University Scribe: Your Name

algorithm with the same guarantee on the discrepancy. It is interesting to note that for this problem, Spencer [] proved that for any matrix A there is a column vector X such that the discrepancy is at most 6

n. It is not known whether there is a polynomial time randomized algorithm that guarantees a discrepancy of O(

n).

The randomized algorithm works as follows. Let X = [X 1 X 2 · · · Xn]T^. Choose each Xi independently and u.a.r. with P [Xi = +1] = P [Xi = −1] = 1/ 2 , 1 ≤ i ≤ n. Before ana- lyzing the performance guarantee of the randomized algorithm we state the classic Boole’s inequality.

Fact 1. Boole’s Inequality:

Let E 1 , E 2 , · · · En be n events. Then, P [E 1 ∪ E 2 ∪ · · · ∪ En] ≤ P [E 1 ] + P [E 2 ] + · · · + P [En].

Boole’s inequality has the following application. Let the events Ei be bad events. Then the union of all these bad events defines the event where at least one bad event occurs. To be able to prove that no bad event occurs with high probability, we can bound the probability that some bad event occurs by estimating the probability of each bad event (individually even though we are not given that all bad events are independent) and summing the probabilities.

Let the product AX be Y = [Y 1 Y 2 · · · Yn]T^. Consider any Yi wlog let it be, say Y 1. By the definition of matrix multiplication, Y 1 = A 11 X 1 + A 12 X 2 + · · · + A 1 nXn where the Aij denotes the element of A at ith^ row and jth^ column. Note that E[Xi] = 0 and by linearity of expectation, E[Y 1 ] = 0. For δ = 2

n ln n, using Theorem 7 we get P [Y 1 ≥ δ] = P [Y 1 ≤

−δ] ≤ e(^

− 4 n ln n 2 n )^ ≤ e−2 ln^ n^ = (^) n^12.

Hence, P [|Y 1 | ≥ δ] = P [Y 1 ≥ δ] + P [Y 1 ≤ −δ] ≤ (^) n^22.

Let us interpret each event |Yi| ≥ 2

n ln n, designated Ei, as a “bad event”. Thus using Boole’s inequality, P [for some i, |Yi| ≥ 2

n ln n] ≤

∑n i=1 P^ [|Yi| ≥^2

n ln n] ≤ n (^) n^22 = (^) n^2. Thus with probability greater than 1 − (^2) n , every entry in Y has absolute value at most

2

n ln n.

Hence with high probability, ||AX||∞ < 2

n ln n. We can even upperbound E(||AX||∞) by observing that when ||AX||∞ is not less than 2

nlnn, it can have a value at most n. Hence, the expected value of the maximum absolute value is at most (1 − (^2) n )

n ln n + (^) n^22 n.

3.1 Analysis of Randomized Quicksort

In this section, we revisit the RandQuickSort algorithm. We will use the Chernoff bound formula to bound a tail probability of the execution time.

We can view the execution of thje algorithm on any iven input of n numbers as a binary tree of pivots. Every node has an associated sets of elements and a pivot element. For

Johns Hopkins University Scribe: Your Name

In the path from the root to the ith smallest element, we define a 0,1 valued r.v. Xi for the ith as follows: Xi takes the value 1 if its pivot is in the middle half of the elements. Let α be the number of Xi’s that equal 1. If the array at the 24 ln nth level has more than one element, then

( 34 )αn > 1, Then α < 3 ln n(the exact value is not important) ⇒ α ln(3/4) + ln n > 0

⇒ α < (^) ln(4ln^ /n3) ⇒ α < 3 ln n

Thus, if more than 3 ln n of the Xi’s are 1, then path of pivots from the root to the ith element is shorter than 24 ln n.

Let X = X 1 + X 2 + · · · + X24 ln n (this is the number of times the Xi’s are 1). Note that P (Xi = 1) = 1/2, hence E(X) = 12 ln n.

P (the path of pivots from the root to the ith element is longer than24 ln n) ≤ P (X ≤ 3 ln n). (13)

P [X ≤ 3 lnn] = P [X ≤ 12 lnn(1 − 34 )] ≤ e(^

−12 ln 2 n)( 34 ) 2 ≤ (^) n^13.

The first inequality follows from the chernoff bound for the lower tail. Using the following version of Chernoff Bound: P (X ≤ (1 − δ)μ) ≤ e−μδ (^2) / 2 , (??) ≤ e−(12 ln^ n)(3/4) (^2) / 2 ≤ (^) n(27^1 /8) ≤ 1 n^3.

Now, using Boole’s inequality, P (The path of pivots from the root to any one of the n elements is longer than 24 ln n) ≤ (^) n^12.

3.2 Conclusion

Previously, we have determined that the expected run-time of RandQuickSort is ≤ 2 n ln n. Now, we have established a high probability bound by sacrificing a constant multiplier in the runtime.