



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Greedy Approximations: Set Cover and Min Makespan, Set Cover problem, Min Makespan Scheduling, Min Makespan Problem, Graham’s List Scheduling, Approximations Algorithms, Shuchi Chawla, Lecture Notes, University of Wisconsin, United States of America
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




CS880: Approximations Algorithms
Scribe: Matt Elder Lecturer: Shuchi Chawla Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/
The Set Cover problem is: Given a set of elements E = {e 1 , e 2 ,... , en} and a set of m subsets of E, S = {S 1 , S 2 ,... , Sn}, find a “least cost” collection C of sets from S such that C covers all elements in E. That is, ∪Si∈C Si = E.
Set Cover comes in two flavors, unweighted and weighted. In unweighted Set Cover, the cost of a collection C is number of sets contained in it. In weighted Set Cover, there is a nonnegative weight function w : S → R, and the cost of C is defined to be its total weight, i.e.,
Si∈C w^ (Si).
First, we will deal with the unweighted Set Cover problem. The following algorithm is an extension of the greedy vertex cover algorithm that we discussed in Lecture 1.
Algorithm 3.1.1 Set Cover(E, S):
(a) Pick an element e ∈ E not covered by C. (b) Add all sets Si containing e to C.
To analyze Algorithm 3.1.1, we will need the following definition:
Definition 3.1.2 A set E′^ of elements in E is independent if, for all e 1 , e 2 ∈ E′, there is no Si ∈ C such that e 1 , e 2 ∈ Si.
Now, we shall determine how strong an approximation Algorithm 3.1.1 is. Say that the frequency of an element is the number of sets that contain that element. Let F denote the maximum frequency across all elements. Thus, F is the largest number of sets from S that we might add to our cover C at any step in the algorithm. It is clear that the elements selected by the algorithm form an independent set, so the algorithm selects no more than F |E′| elements, where E′^ is the set of elements picked in Step 2a. That is, ALG ≤ F |E′|. Because every element is covered by some subset in an optimal set cover, we know that |E′| ≤ OPT for any independent set E′. Thus, ALG ≤ F OPT, and Algorithm 3.1.1 is therefore an F –approximation.
Theorem 3.1.3 Algorithm 3.1.1 is an F –approximation to Set Cover.
Algorithm 3.1.1 is a good approximation if F is guaranteed to be small. In general, however, there could be some element contained in every set of S, and Algorithm 3.1.1 would be a very poor approximation. So, we consider a different unweighted Set Cover approximation algorithm which uses the greedy strategy to yield a ln n–approximation.
Algorithm 3.1.4 Set Cover(E, S):
(a) Find the set Si containing the greatest number of uncovered elements. (b) Add Si to C.
Theorem 3.1.5 Algorithm 3.1.4 is a ln (^) OPTn –approximation.
Proof: Let k = OPT, and let Et be the set of elements not yet covered after step i, with E 0 = E. OPT covers every Et with no more than k sets. ALG always picks the largest set over Et in step t + 1. The size of this largest set must cover at least |Et|/k in Et; if it covered fewer elements, no way of picking sets would be able to cover Et in k sets, which contradicts the existence of OPT. So, |Et+1| ≤ |Et| − |Et|/k, and, inductively, |Et| ≤ n (1 − 1 /k)t.
When |Et| < 1, we know we are done, so we solve for this t:
k
)t <
n
⇒ n <
k k − 1
)t
⇒ ln n ≤ t ln
k − 1
t k ⇒ t ≤ k ln n = OPT ln n.
Algorithm 3.1.4 finishes within OPT ln n steps, so it uses no more than that many sets. We can get a better analysis for this approximation by considering when |Et| < k, as follows:
n
k
)t = k
⇒ n
et/k^
= k (because (1 − x)^1 /x^ ≤
e
for all x).
⇒ et/k^ =
n k ⇒ t = k ln
n k
Thus, after k ln n k steps there remain only k elements. Each subsequent step removes at least one element, so ALG ≤ OPT
ln (^) OP Tn + 1
Theorem 3.1.6 If all sets are of size ≤ B, then there exists a (ln B + 1)–approximation to un- weighted Set Cover.
Proof: If all sets have size no greater than B, then k ≥ n/B. So, B ≥ n/k, and Algorithm 3.1. gives a (ln B + 1)–approximation.
The dots are elements, and the loops represent the sets of S. Each set has weight 1. The optimal solution is to take the two long sets, with a total cost of 2. If Algorithm 3.1.7 instead selects the leftmost thick set at first, then it will take at least 5 sets. This example generalizes to a family of examples each with 2k^ elements, and shows that no analysis of Algorithm 3.1.7 will make it better than a O(ln n)–approximation.
A ln n–approximation to Set Cover can also be obtained by other techniques, including LP-rounding. However, Feige showed that no improvement, even by a constant factor, is likely:
Theorem 3.1.9 There is no (1 − ) ln n–approximation to Weighted Set Cover unless NP ⊆ DTIME(nlog log^ n). [1]
The Min Makespan Problem is: given n jobs to schedule on m machines, where job i has size si, schedule the jobs to minimize their makespan.
Definition 3.2.1 The makespan of a schedule is the earliest time when all machines have stopped doing work.
This problem is NP-hard, as can be seen by a reduction from Partition. The following algorithm due to Ron Graham yields a 2–approximation.
Algorithm 3.2.2 (Graham’s List Scheduling) [2] Given a set of n jobs and a set of m empty machine queues,
Theorem 3.2.3 Graham’s List Scheduling is a 2–approximation.
Proof: Let Sj denote the size of job j. Suppose job i is the last job to finish in a Graham’s List schedule, and let ti be the time it starts. When job i was placed, its queue was no longer
than any other queue, so every queue is full until ti. Thus, ALG = Si + ti ≤ Si +
(Pnj=1 Sj )−Si 1 m^ = m
∑n j=1 Sj^ + (1^ −^1 /m)Si. It’s easy to see that^ Si^ ≤^ OPT and that^
1 m
∑n j=1 Sj^ ≤^ OP T^. So, we conclude that ALG ≤ (2 − 1 /m)OPT, which yields a 2–approximation.
This analysis is tight. Suppose that after the jobs are arbitrarily ordered, the job list contains m(m−1) unit-length jobs, followed by one m-length job. The algorithm yields a schedule completing in 2m − 1 units while the optimal schedule has length m.
This algorithm can be improved. For example, by ordering the job list by increasing duration instead of arbitrarily, we get a (4/3)–approximation, a result proved in [3]. Also, this problem has a poly- time approximation scheme (PTAS), given in [4]. However, a notable property of Algorithm 3.2. is that it is an online algorithm, i.e., even if the jobs arrive one after another, and we have no information about what jobs may arrive in the furture, we can still use this algorithm to obtain a 2–approximation.
[1] Uriel Feige. A Threshold of ln n for Approximating Set Cover. In J. ACM 45(4), pp 634-652. (1998)
[2] Graham, R. Bounds for Certain Multiprocessing Anomalies. In Bell System Tech. J., 45, pp 1563-1581. (1966)
[3] Ronald L. Graham. Bounds on Multiprocessing Timing Anomalies. In SIAM Journal of Applied Mathematics, 17(2), pp 416-429. (1969)
[4] Dorit S. Hochbaum, David B. Shmoys. A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach. In SIAM J. Comput. 17(3), pp 539-551. (1988)
[5] D. S. Johnson. Approximation Algorithms for Combinatorial Problems. In Journal of Computer and System Sciences, 9, pp 256-278. (1974) Preliminary version in Proc. of the 5th Ann. ACM Symp. on Theory of Computing, pp 36-49. (1973)