Randomized Algorithms-Advanced Algorithms-Lecture 06 Notes-Computer Science, Study notes of Advanced Algorithms

Randomized Algorithms, Las Vegas Algorithms, Monte Carlo Algorithms, Motivation, Load Balancing, Symmetry Breaking, Sampling, Searching for witnesses, Elementary Probability Theory, Global Min Cut, Deterministic Method, Karger's Algorithm, Advanced Algorithms, Shuchi Chawla, Lecture Notes, University of Wisconsin, United States of America.

Typology: Study notes

2011/2012

Uploaded on 02/14/2012

alexey
alexey 🇺🇸

4.7

(20)

325 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS787: Advanced Algorithms
Scribe: Yudong Tang and Baris Aydinlioglu Lecturer: Shuchi Chawla
Topic: Randomized Algorithms Date: September 17, 2007
In this lecture and the next we cover randomized algorithms. Today we begin by motivating the
use of randomized algorithms. Next we recall some related definitions and results from probability
theory. We end today’s lecture with studying Karger’s randomized algorithm for computing a
min-cut of a graph.
6.1 Introduction
We define a randomized algorithm as an algorithm that can toss coins and act depending on the
outcome of those tosses. There are two kinds of randomized algorithms that we will be studying:
Las Vegas Algorithms: These refer to the randomized algorithms that always come up with
a/the correct answer. Their “expected” running time is polynomial in the size of their input,
which means that the average running time over all possible coin tosses is polynomial. In
the worst case, however, a Las Vegas algorithm may take exponentially long. One example
of a Las Vegas algorithm is Quicksort: it makes some of its decisions based on coin-tosses,
it always produces the correct result, and its expected running and worst-case running times
are nlog nand n2, respectively.
Monte Carlo Algorithms: These refer to the randomized algorithms that sometimes come
up with an incorrect answer. Like Las Vegas algorithms their expected running time is
polynomial but may be exponential.
By performing independent runs of a Monte Carlo algorithm, we can decrease the chances of
obtaining an incorrect result to a value as small as we like. For example, if a particular Monte
Carlo algorithm produces an incorrect result with probability 1/4, and we have the ability
to detect an incorrect result, then the probability of not obtaining a correct answer after t
independent runs is (1/4)t. In some contexts, such as in optimization problems, we do not
have the ability to detect an incorrect answer, but can compare different answers and pick
the best one. The same analysis would also apply in these cases, because if picking the best
answer does not give the optimal answer, then it means the algorithm produced an incorrect
answer in all of its independent runs.
Sometimes we may not have an algorithm with an error probability of a small constant, but
we may have one with error probability a function of the input size. In such a case, we aim
for an error probability from which we can obtain desired low values with a small number
(i.e., polynomial in the size of the input) of independent runs.
6.2 Motivation
Randomized algorithms have several advantages over deterministic ones. We discuss them here:
1
pf3
pf4
pf5

Partial preview of the text

Download Randomized Algorithms-Advanced Algorithms-Lecture 06 Notes-Computer Science and more Study notes Advanced Algorithms in PDF only on Docsity!

CS787: Advanced Algorithms Scribe: Yudong Tang and Baris Aydinlioglu Lecturer: Shuchi Chawla Topic: Randomized Algorithms Date: September 17, 2007

In this lecture and the next we cover randomized algorithms. Today we begin by motivating the use of randomized algorithms. Next we recall some related definitions and results from probability theory. We end today’s lecture with studying Karger’s randomized algorithm for computing a min-cut of a graph.

6.1 Introduction

We define a randomized algorithm as an algorithm that can toss coins and act depending on the outcome of those tosses. There are two kinds of randomized algorithms that we will be studying:

  • Las Vegas Algorithms: These refer to the randomized algorithms that always come up with a/the correct answer. Their “expected” running time is polynomial in the size of their input, which means that the average running time over all possible coin tosses is polynomial. In the worst case, however, a Las Vegas algorithm may take exponentially long. One example of a Las Vegas algorithm is Quicksort: it makes some of its decisions based on coin-tosses, it always produces the correct result, and its expected running and worst-case running times are n log n and n^2 , respectively.
  • Monte Carlo Algorithms: These refer to the randomized algorithms that sometimes come up with an incorrect answer. Like Las Vegas algorithms their expected running time is polynomial but may be exponential. By performing independent runs of a Monte Carlo algorithm, we can decrease the chances of obtaining an incorrect result to a value as small as we like. For example, if a particular Monte Carlo algorithm produces an incorrect result with probability 1/4, and we have the ability to detect an incorrect result, then the probability of not obtaining a correct answer after t independent runs is (1/4)t. In some contexts, such as in optimization problems, we do not have the ability to detect an incorrect answer, but can compare different answers and pick the best one. The same analysis would also apply in these cases, because if picking the best answer does not give the optimal answer, then it means the algorithm produced an incorrect answer in all of its independent runs. Sometimes we may not have an algorithm with an error probability of a small constant, but we may have one with error probability a function of the input size. In such a case, we aim for an error probability from which we can obtain desired low values with a small number (i.e., polynomial in the size of the input) of independent runs.

6.2 Motivation

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

  • Randomized algorithms tend to bring simplicity. For example, recall from Lecture 2 the problem of finding the kth smallest element in an unordered list, which had a rather involved deterministic algorithm. In contrast, the strategy of picking a random element to partition the problem into subproblems and recursing on one of the partitions is much simpler.
  • Another advantage randomized algorithms can sometimes provide is runtime efficiency. Indeed, there are problems for which the best known deterministic algorithms run in ex- ponential time. A current example is Polynomial Identity Testing, where given a n-variate polynomial f of degree d over a field F, the goal is to determine whether it is identically zero. If f were univariate then merely evaluating f at d + 1 points in F would give us a correct algorithm, as f could not have more that d roots in F. With n variables, however, there may be nd^ roots, and just picking nd^ + 1 points in F gives us exponential time. While we don’t know of a deterministic polytime algorithm for this problem, there is a simple algorithm that evaluates f at only 2d points, based on a theorem by Zippel and Schwarz. A prevalent example where randomization helps with runtime efficiency used to be Primality testing, where given an n-bit number, the goal is to determine if it is prime. While straightfor- ward randomized algorithms for this problem existed since the 70’s, which ran in polynomial time and erred only if the number was prime, it was not known until two years ago whether a deterministic polytime algorithm was possible—in fact, this problem was being viewed as evidence that the class of polytime randomized algorithms with one-sided error is more pow- erful than the class of polytime deterministic algorithms (the so called RP vs P question). With the discovery of a polytime algorithm for Primality, however, the question still remains open and can go either way.
  • There are problems in which randomized algorithms help us where deterministic algorithms can provably not work. A common feature of such problems is lack of information. A classical example involves two parties, named A (for Alice) and B (for Bob), each in possession of an n bit number (say x and y resp.) that the other does not know, and each can exchange information with the other through an expensive communication channel. The goal is to find out whether they have the same number while incurring minimal communication cost. Note that the measure of efficiency here is the number of bits exchanged, and not the running time. It can be shown that a deterministic protocol has to exchange n bits in order to achieve guaranteed correctness. On the other hand, the following randomized protocol, based on the idea of fingerprinting, works correctly with almost certainty: A picks a prime p in the range [2n, 4n] at random, computes x mod p, and sends both x and p to B, to which B responds with 1 or 0 indicating whether there was a match. With only O(log n) bits exchanged, the probability of error in this protocol–i.e., Pr[x 6 = y and x mod p = y mod p]– can be shown to be at most 1/n. By repeating the protocol, we can reduce the probability of an error to less than that of an asteroid colliding with either of A or B’s residences, which should be a practically sufficient threshold.

We now list several more examples of tasks where randomization is useful:

Load Balancing: In this problem there are m machines and incoming jobs, and the goal is to assign the jobs to the machines such that the machines are equally utilized. One simple way that

  • Linearity of Expectation: E[

i αi^ ·^ Xi] =^ αi^ ·

i E[Xi], where^ αi’s are scalars and no indepen- dence assumptions are made on Xi’s. It follows from this rule that the above two definitions of variance are identical.

  • Multiplication Rule: E[X 1 · X 2 ] = E[X 1 ] · E[X 2 ] if X 1 , X 2 are independent.
  • Bayes’ Rule: If Ei are disjoint events and Pr[

i Ei] = 1, then^ E[X] =^

i E[X|Ei]^ ·^ Pr[Ei].

  • Union Bound: Pr[

i Ei]^ ≤^

i Pr[Ei].

  • Markov’s Inequality: If X ≥ 0 with probability 1, then Pr[X > α] ≤ E[ αX]

We only give a proof of the last statement:

Proof: This is a proof of Markov’s inequality.

E[X] = E[X| 0 ≤ X ≤ α] · Pr[0 ≤ X ≤ α] + E[X|α < X] · Pr[α < X] (6.3.1) ≥ E[X|α < X] · Pr[α < X] (6.3.2) ≥ α · Pr[α < X] (6.3.3)

6.4 Global Min Cut

6.4.1 Description of the problem

Let G = (V, E) be a graph. For simplicity, we assume G is undirected. Also assume the capacity of each edge is 1, i.e., Ce = 1, ∀e ∈ E. A cut is a partition (S, S¯) of V such that S 6 = ∅, V. We define the capacity of (S, S¯) to be the sum of the capacities of the edges between S and S¯. Since Ce = 1, ∀e ∈ E, it is clear that the capacity of (S, S¯) is equal to the number of edges from S to S¯, |E(S, S¯)| = |(S × S¯) ∩ E|.

Goal: Find a cut of globally minimal capacity in the above setup.

6.4.2 Deterministic method

This problem can be solved by running max-flow computation (Ford-Fulkerson algorithm) n − 1 times, n = |V |. One fixes a node s to be the source, and let the sink run through all other nodes. In each case, calculate the min cut using the Ford-Fulkerson algorithm, and compare these min cuts. The minimum of the n − 1 min cuts will be the global min cut.

To see the above statement, consider the following: Let (S, S¯) be the global min cut. For the node s we fixed, either s ∈ S or s ∈ S¯. If s ∈ S, then pick a node t in S¯ as the sink. The max-flow computation with s as source and t as sink gives a flow f. f must be no greater than the capacity of (S, S¯) since f has to flow from S to S¯. So the min cut when s is source and t is sink must be no

greater than (S, S¯). Therefore, when t ∈ S¯ is chosen as sink, the min cut has the same capacity as the global min cut. In the case s ∈ S¯, notice that G is undirected, so flow can be reversed and it goes back to the first case.

6.4.3 Karger’s algorithm: a randomized method

Description of the algorithm:

  1. Set S(v) = {v}, ∀v ∈ V.
  2. If G has only two nodes v 1 , v 2 , output cut (S(v 1 ), S(v 2 )).
  3. Otherwise, pick a random edge (u, v) ∈ E, merge u and v into a single node uv, and set S(uv) = S(u) ∪ S(v). Remove any self-loops from G.

Theorem 6.4.1 Karger’s algorithm returns a global min cut with probability p ≥ 1 (n 2 )

Proof: Let (S, S¯) be a global min cut of G, and k be the capacity of (S, S¯). Let E be the event that the Karger’s algorithm produces the cut (S, S¯).

First, we show that E occurs if and only if the algorithm never picks up any edge from S to S¯. The “only if” part is clear, so we show the “if” part: Assume the algorithm doesn’t pick up any edge of cut (S, S¯). Let e be the first edge of cut (S, S¯) which is eliminated by the algorithm, if there is any. The algorithm can only eliminate edges in two ways: Pick it up or remove it as a self loop. Since e can’t be picked up by assumption, it must be eliminated as a self-loop. e becomes a self-loop if and only if its two end points u, v are merged, which means an edge e′^ connects u, v is selected in some iteration. In that iteration, u, v may be the combination of many nodes in original graph G, ie, u = u 1 u 2 ...ui, v = v 1 v 2 ...vj , but since none of the previous picked up edges is in cut (S, S¯), the nodes merged together must belong to the same set of S or S¯. So if e connects S to S¯, the e′^ will also connects S to S¯, which contradicts to the assumption that the algorithm doesn’t pick up edges of cut (S, S¯).

Second, notice that (S, S¯) remains to be global min cut in the updated graph G as long as no edge of (S, S¯) has been picked up. This is because any cut of the updated graph is also a cut in the orig- inal graph and the edges removed are only internal edges, which don’t affect the capacity of the cut.

Now let Ei be the event that the algorithm doesn’t pick any edge of (S, S¯) in i-th iteration. Then,

Pr[E] = Pr[E 1 ] · Pr[E 2 |E 1 ] · Pr[E 3 |E 1 ∩ E 2 ]... (6.4.4)

Observe that, in the i-th iteration, assuming

1 ≤j≤i− 1 Ej^ has occurred, every node has at least^ k edges. For otherwise the node with fewer than k edges and the rest of the graph would form a partition with capacity less than k. It follows that there are at least (n − i + 1)k/2 edges in i-th