








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In a randomized algorithm you can toss a fair coin as a step of computation. Alternatively a bit taking values of 0 and 1 with equal probabilities can be ...
Typology: Exercises
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Johns Hopkins University Scribe: Your Name
In a randomized algorithm you can toss a fair coin as a step of computation. Alternatively a bit taking values of 0 and 1 with equal probabilities can be chosen in a single step. More generally, in a single step an element out of n elements can be chosen with equal probalities (uniformly at random). In such a setting, our goal can be to design an algorithm that minimizes run time.
Randomized algorithms are often advantageous for several reasons. They may be faster than deterministic algorithms. They may also be simpler. In addition, there are problems that can be solved with randomization for which we cannot design efficient deterministic algorithms directly. In some of those situations, we can design attractive deterministic algoirthms by first designing randomized algorithms and then converting them into deter- ministic algorithms by applying standard derandomization techniques.
Deterministic Algorithm: Worst case running time, i.e. the number of steps the algo- rithm takes on the worst input of length n.
Randomized Algorithm:
As an example of a randomized algorithm, we now present the classic quicksort algorithm and derive its performance later on in the chapter. We are given a set S of n distinct elements and we want to sort them. Below is the randomized quicksort algorithm.
Algorithm RandQuickSort(S = {a 1 , a 2 , · · · , an} If |S| ≤ 1 then output S; else: Choose a pivot element ai uniformly at random (u.a.r.) from S Split the set S into two subsets S 1 = {aj |aj < ai} and S 2 = {aj |aj > ai} by comparing each aj with the chosen ai Recurse on sets S 1 and S 2 Output the sorted set S 1 then ai and then sorted S 2. end Algorithm
For this the algorithm we will establish that the expected speed on any input of length is no more than ≥ 2 n ln n. In addition, we will establish that the probability that the algorithm will take more than
Johns Hopkins University Scribe: Your Name
12 ln n steps is no more than (^) n^12. Hence if n = 1000 and if we run the a algorithm for a million (1000^2 ) times, it will run for more than 12000 ln 1000 at most once.
Before we can design and analyze randomized algorithms, however, it is important to review the basic concepts in probability theory.
When we conduct a random experiment several possible outcomes may occur.
Definition 1 (Probability Space). A probability space consists of a universe Ω, a collection of subsets of Ω known as events, and a function, P , over the events that satisfies the following properties:
i P^ (Ei).
Note that if E 1 , E 2 are events E 1 ∩ E 2 is also an event since E 1 ∩ E 2 = E 1 ∩ E 2 = E 1 ∪ E 2.
Definition 2 (Conditional Probability). The conditional probability of event E 1 given that event E 2 has occurred, written P (E 1 |E 2 ), is defined as
P (E 1 |E 2 ) =
At times, we write P (E 1 ∩ E 2 ) as P (E 1 , E 2 ). More generally, we write P (E 1 ∩ E 2 ∩ · · · ∩ En) as P (E 1 , E 2 , · · · , En)
Observation:
P (E 1 ∩ E 2 ∩ · · · ∩ En) = P (E 1 |E 2 ∩ · · · ∩ En) P (E 2 |E 3 ∩ · · · ∩ En) · · · P (En− 1 |En) P (En).
Definition 3 (Independence). Events E 1 and E 2 are independent if
P (E 1 ∩ E 2 ) = P (E 1 ) P (E 2 )
This can also be stated as:
P (E 1 |E 2 ) = P (E 1 ) or P (E 2 |E 1 ) = P (E 2 )
Johns Hopkins University Scribe: Your Name
Definition 10 (Pairwise Independence). Random variables X 1 , X 2 , ..., Xn are pairwise in- dependent if for every distinct i and j, Xi and Xj are independent.
Definition 11 (k-wise Independence). Random variables X 1 , X 2 , ..., Xn are k-wise inde- pendent, 2 ≤ k ≤ n, if for every distinct i 1 , i 2 , ..., ik 1 , 2 ≤ k 1 ≤ k, Xi 1 , Xi 2 , ..., Xik 1 are independent.
Example 1. 4 balls are thrown independently and u.a.r. into 5 bins. What is the probability that 2 balls fall into bin 1?
We define our probability space to be Ω = {(i 1 , i 2 , i 3 , i 4 )|ij ∈ { 1 , 2 , 3 , 4 , 5 }}. The value ij specifies the bin into which ball j falls. Since each ball is thrown independently and u.a.r., for every i 1 , i 2 , i 3 , i 4 , P (i 1 , i 2 , i 3 , i 4 ) = 514. Define a r.v. X : Ω → { 0 , 1 , 2 , 3 , 4 } s.t. for every
i 1 , i 2 , i 3 , i 4 , X((i 1 , i 2 , i 3 , i 4 )) = {
j f^ (ij^ ), in which^ f^ (k) =
1 if k = 1 0 otherwise. Note that r.v. X stands for the number of balls that fall into bin 1. We are interested in P (X = 2). For any choice of 2 j′s s.t. ij = 1, the other positions can be chosen as any value from { 2 , 3 , 4 , 5 }. Hence the number of (i 1 , i 2 , i 3 , i 4 )′s s.t. exactly 2 of the i′ j s are 1’s
is
2
2
4 2 1 54 which is given by
2
5
5
Example 2. Let X be the number of balls that fall into bin 1. What is the value of E (X)?
Using the same method as above, we have
P (X = 0) =
5
1
5
5
2
5
5
3
5
5
5
Then E (X) = 0
5
1
5
5
2
5
5
3
5
5
5
= 45. This result can also be justified by appealing to our intuition: The expected number of balls that fall into any of the 5 bins should be the same. Since there a total of 4 balls, for any bin the expected number of balls should be one fifth of 4.
Definition 12 (Function of a Random Variable). If X 1 ,... , Xk are random variables and f is a function, then f (X 1 ,... , Xk) is a random variable such that
P (f (X 1 , X 2 ,... , Xk) = a) =
a 1 ,... , ak s.t. f (a 1 ,... , ak) = a
P (X 1 = a 1 ,... Xk = ak) (4)
Johns Hopkins University Scribe: Your Name
Theorem 1.
E (f (X 1 ,... , Xk)) =
a 1 ,...,ak
f (a 1 ,... , ak) P (X 1 = a 1 ,... , Xk = ak)
Example 3. Compute E(X 1 + X 2 ) for the joint mass function given by
We compute E(X 1 , X 2 ) by two different methods: By applying the definition and by applying the above theorem.
Direct Computation:
P (X 1 + X 2 = 2) = P (X 1 = 1, X 2 = 1) =. 1 P (X 1 + X 2 = 3) = P (X 1 = 1, X 2 = 2) + P (X 1 = 2, X 2 = 1) = .1 + .3 =. 4 P (X 1 + X 2 = 4) = P (X 1 = 1, X 2 = 3) + P (X 1 = 2, X 2 = 2) = .2 + .1 =. 3 P (X 1 + X 2 = 5) = P (X 1 = 2, X 2 = 3) =. 2
Hence, E(X 1 + X 2 ) = 2(.1) + 3(.4) + 4(.3) + 5(.2) = 3. 6.
By Theorem:
Theorem 2. If X 1 , X 2 , · · · , Xk, Xk+1, · · · Xn are independent r.v.s and f and g are any functions, then f (X 1 , · · · , Xk) and g (Xk+1, · · · , Xn) are independent.
Theorem 3 (Linearity of Expectation). Let X 1 ,... , Xn be random variables, c 1 ,... , cn be reals, and let X =
∑n i=1 ciXi. Then
∑^ n
i=
ciE (Xi). (5)
Johns Hopkins University Scribe: Your Name
Example 5. Throw n balls into n bins such that the first ball is thrown u.a.r. among the n bins and, the ith^ ball, i ≥ 2 , cannot fall into the bin of the i − 1 th^ ball but falls u.a.r. among the other n − 1 bins. Let X =
Xi be the number of balls that fall into bin 1. What is E (X)?
Note that
P (Xi = 1|Xi− 1 = 1) = 0 P (Xi = 1|Xi− 1 = 0) = (^) n−^11
Claim 1. P (Xi = 1) = (^1) n
Proof. By induction on i.
Base Case P (X 1 = 1) = (^1) n , since the first ball is thrown u.a.r. among the n bins.
Inductive Step Assume the claim holds for i < k. Then P (Xk− 1 = 1) = (^) n^1 and P (Xk− 1 = 0) = 1 − (^) n^1. We have
P (Xk = 1, Xk− 1 = 1) = 0 P (Xk = 1, Xk− 1 = 0) = P (Xk = 1|Xk− 1 = 0) P (Xk− 1 = 0) = (^) n^1 − 1
1 − (^1) n
= (^1) n P (Xk = 1) = P (Xk = 1, Xk− 1 = 1) + P (Xk = 1, Xk− 1 = 0) = 0 + (^1) n = (^1) n
Hence the inductive step holds.
Therefore E (Xi) = (^1) n , and E (X) = E (
∑n i=1 Xi) =^
∑n i=1 E(Xi) =^ n^
1 n = 1.
Although our throws are not independent this time, we are still able to compute the expec- tation of X by applying the linearity of expectation.
Definition 13 (Variance). Let X be a random variable with E (X) = μX. The variance of X, denoted V ar(X), is defined as V ar(X) = E[(X − μX )^2 ].
Note that
V ar (X) = E
(X − μX )^2
X^2 − 2 XμX + μ^2 X
− 2 μ^2 X + μ^2 X = E
− μ^2 X
Johns Hopkins University Scribe: Your Name
Definition 14 (Standard Deviation). The standard deviation of X, denoted σX , is defined as σX =
V ar (X).
Lemma 1. If X and Y are independent random variables then E (XY ) = E (X) E (Y ).^2
Proof.
E (XY ) =
a,b
abP (X = a, Y = b)
a
b
abP (X = a) (Y = b) by independence of X and Y
a
aP (X = a)
b
bP (Y = b)
a
aP (X = a)E(Y )
a
aP (X = a)
= E(Y )E(X)
Lemma 2. If X and Y are independent random variables then V ar (X + Y ) = V ar (X) + V ar (Y ).
Proof.
V ar (X + Y ) = E
(X + Y − (μX + μY ))^2
(X − μX )^2 + (Y − μY )^2 + 2 (X − μX ) (Y − μY )
(X − μX )^2
(Y − μY )^2
Definition 15. The moment generating function (mfg) of a random variable X, denoted MX (t), is E(etX^ ). When X is understood, we simply write it as M (t).
Observation: E
Xk
= d
k dtk^ M^ (0) (^2) This generalizes to n random variables.
Johns Hopkins University Scribe: Your Name
Definition 16 (Conditional Expectation). For any r.v.s. X and Y the expectation of X conditioned on Y = b is given by
E (X|Y = b) =
a
aP (X = a|Y = b).
For example, the joint distribution of Example 2,
Definition 17. E(X|Y ) is a r.v. and it is a function of Y, f (Y ), given by f (b) = E (X|Y = b). For the joint distribution of Example 3, E (X 1 |X 2 ) = f (X 2 ) is given by:
f (1) = 1. 75 as computed above
f (2) = E (X 1 |X 2 = 2) = 1
f (3) = E (X 1 |X 2 = 3) = 1
Theorem 4. E(X) = E(E(X|Y )). That is, E(X) =
b E(X|Y^ =^ b)P^ (Y^ =^ b).
For the above example, note that E (E (X 1 |X 2 )) = 1.75(.4) + 1.5(.2) + 1.5(.4) = 1.6.
Direct computation of E(X 1 ) will also yield 1.
Now we illustrate an application of the above theorem for a more complicated problem.
Example 7. Choose N u.a.r. from {1,2,...,n}. Let X =
i=1 Xi, in which each^ Xi^ is chosen u.a.r. from {0,1,...,i}. Compute E(X).
We cannot directly apply the linearity of expectation principle since the number of vairables itself is a random variable. It would be incorrect if we first compute E(N ) = (^1) n
∑n i=1 i^ =^
n+ 2 and then compute E(X 1 + X 1 + ... + X n+ 2
We attack the problem by applying the above theorem: E(X) = E(E(X|N )).
Johns Hopkins University Scribe: Your Name
For any i, P (N = i) = (^1) n , and
E(X|N = k) = E(
∑^ k
i=
Xi)
∑^ k
i=
E(Xi)
∑^ k
i=
i + 1
(0 + 1 + ... + i)
∑^ k
i=
i 2
=
k(k + 1) 4
Hence E(X) = E(E(X|Y )) =
∑n k=
1 n
k(k+1) 4 =^
n^2 +3n+ 12 Hence E(X) = n
(^2) +3n+ 12
2 Probabilistic Recurrences
In this section, we examine two problems and present randomized algorithms for them. The number of steps a randomized algorithm executes on a specific input of length is a random variable. Here we are interested in the expected number of steps. The expectation can vary for different inputs of length n. Throughout this course, we are interested in the expected number of steps for the worst input. For each randomized algorithm, we first express the expected runtime by a recurrence relation.
We will begin by finding the minimum of a set of n numbers. Although the minumum can be computed by a simple optimal deterministic algorithm, we design a randomized algorithm to gain insights into solving recurrence relations involving random variables. The randomized algorithm is presented below.
Algorithm FindMin(a 1 , a 2 , · · · , an) choose an i u.a.r. in { 1 , 2 , · · · , n}; compare the ai with every other aj ; if ai is the minimum element, then output ai; else recurse on the set of elements less than ai; end Algorithm
Johns Hopkins University Scribe: Your Name
where H(n) is the nth^ Harmonic number defined as H(n) =
∑n i= 1 i.
Therefore, we have proved that the expected number of steps taken by the algorithm Find- Min is 2n−O(ln n). We could also have solved the above recurrence by guessing the solution and verifying the it using induction, which is shown below.
Alternative Proof by Induction
We claim the solution to the recurrence relation (9) is T (n) ≤ 2 n, which we prove by mathematical induction.
Now we derive the expected run time of the RandQuickSort algorithm presented earlier.
Let T (n) be the number of steps taken by the RandQuickSort algorithm on a set of size n. Note that the maximum value of T (n) occurs when the pivot element xi is the largest/s- mallest element of the remaining set during each recursive call of the algorithm. In this case, T (n) = n + (n − 1) + · · · + 1 = O(n^2 ). This value of T (n) is reached with a very low probability of (^) n^2 · (^) n^2 − 1 · · · · 22 = 2 n n−! 1. Also, the best case occurs when the pivot element splits the set S into two equal sized subsets and then T (n) = O(n ln n). This implies that T (n) has a distribution between O(n ln n) and O(n^2 ). Now we derive the expected value of T (n). Note that if the ith^ smallest element is chosen as the pivot element then S 1 and S 2 will be of sizes i − 1 and n − i − 1 respectively and this choice has a probability of (^) n^1. The recurrence relation for T (n) is:
T (n) = n − 1 + T (X) + T (n − 1 − X) (10)
where, P [X = i] = (^) n^1 for 0 ≤ i ≤ n − 1.
Taking expectations on both sides of (10),
E[T (n)] = n − 1 + (^1) n
∑n− 1 i=1 E[T^ (i)] +^
1 n
∑n− 1 j=1 E[T^ (j)] = n − 1 + (^2) n
∑n− 1 i=1 E[T^ (i)].
Johns Hopkins University Scribe: Your Name
Let f (i) = E[T (i)]. Then, f (n) = n + (^) n^2
∑n− 1 i=1 f^ (i). Simplifying,
nf (n) = n(n − 1) + 2(f (1) + f (2) + · · · + f (n − 1)) (11)
Substituting n − 1 for n in (11),
(n − 1)f (n − 1) = (n − 1)(n − 2) + 2(f (1) + f (2) + · · · + f (n − 2)) (12)
Subtracting (12) from (11), we get nf (n) − (n − 1)f (n − 1) = (2n − 2) + 2f (n − 1) or f (n) = n+1 n f (n − 1) + 2 n n− 2.
2.2.1 Claim: f (n) ≤ 2 n ln n
Proof: We prove this by induction on n.
Base Case: When n = 1, f (1) = 0 ≤ 0 holds.
Inductive Step: Let the claim hold for all values up to n − 1. Then,
f (n) = n+1 n f (n − 1) + 2 n n−^2 ≤ n+1 n 2(n − 1) ln(n − 1) + 2 n n− 2 by inductive hypothesis = 2(n
(^2) −1) n ln(n^ −^ 1) +^
2 n− 2 n = 2(n
(^2) −1) n (ln^ n^ + ln(1^ −^
1 n )) +^
2 n− 2 n
We make use of the standard inequality stated below.
Fact 1. 1 + x ≤ ex^ and ln(1 + x) ≤ x for any x ∈ R. For example, 1 + 0. 2 ≤ e^0.^2 and ln(1 − 0 .2) ≤ − 0. 2.
Hence f (n) ≤ 2(n
(^2) −1) n (ln^ n^ −^
1 n ) +^
2 n− 2 n = 2 n ln n − (^) n^2 ln n − 2 + (^) n^22 + 2 − (^) n^2 ≤ 2 n ln n, establishing the inductive step.
Hence the expected run time of the RandQuickSort algorithm is O(n ln n). Nevertheless, one of the limitations of using recurrence relations is that we do not know how the runtime of the algorithm is distributed around its expected value. Can this analysis be extended to answer questions such as, ”With what probability does the algorithm RandQuickSort need more than 24n ln n time steps?” Later on, we will apply a different technique and establish that this probability is very small. Similarly, for the case of the FindMin algorithm, by solving the recurrence relation for the expected running time, we will be unable to answer questions like the following: “What is the probability that the runtime is greater than 3n?” The answer to these and other similar queries lies in the study of Tail Inequalities, discussed in the next section.