Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Set Cover Problem: Unweighted and Weighted Formulations and Approximation Algorithms, Study notes of Advanced Algorithms

Islamic University of Science & Technology Advanced Algorithms

The set cover problem, a well-known np-complete problem in computer science. The problem is presented in both ordinary and hitting set formulations, and optimal approximation algorithms based on greedy strategies and integer programming formulations are introduced. The document also covers weighted set covering and its application in routing data packages.

Typology: Study notes

2012/2013

Uploaded on 04/23/2013

atasi 🇮🇳

4.6

(32)

134 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

Lecture 8: Set Cover

Abstract

This lecture focused on the problem of “Set Cover”, which is known

as one of the first proved 21 NP-complete problems[2]. Two formula-

tions will be given and one optimal approximation algorithm based on

a greedy strategy is introduced. Further, the problem is generalized

to weighted elements and an approximation algorithm derived from

an Integer Programming(IP) formulation is presented.

1 Unweighted Set Covering

There are two different ways to look at the set covering problem. First

we introduce the ordinary formulation, then we introduce the hitting set

formulation.

1.1 Ordinary Formulation

We define P, a collection of subsets of a set Xas follows:

X={x1. . . xn}(1)

P={P1. . . Pr}(2)

where xirepresents a skill, Pja person, and xi∈Pjif person Pjpossesses

the skill xi, for i= 1, ...n and j= 1, ...r.

The goal is to as an employer find the minimum number of people to cover

all the skills, namely to find the smallest set R⊆Psuch that the people in

R covers all the skills:

∪P∈RP=X(3)

An example with 6 person and 12 skills will be illustrated. In the example,

by picking a set containing P1,P2,P4,P6we could ensure that this set covers

every skill. However, the set containing P3,P4,P5has the size 3, smaller

than the former set with size 4.

1

Discover Study notes of Advanced Algorithms Islamic University of Science & Technology

Partial preview of the text

Download Set Cover Problem: Unweighted and Weighted Formulations and Approximation Algorithms and more Study notes Advanced Algorithms in PDF only on Docsity!

Lecture 8: Set Cover

Abstract This lecture focused on the problem of “Set Cover”, which is known as one of the first proved 21 NP-complete problems[2]. Two formula- tions will be given and one optimal approximation algorithm based on a greedy strategy is introduced. Further, the problem is generalized to weighted elements and an approximation algorithm derived from an Integer Programming(IP) formulation is presented.

1 Unweighted Set Covering

There are two different ways to look at the set covering problem. First we introduce the ordinary formulation, then we introduce the hitting set formulation.

1.1 Ordinary Formulation

We define P , a collection of subsets of a set X as follows:

X = {x 1... xn} (1) P = {P 1... Pr} (2)

where xi represents a skill, Pj a person, and xi ∈ Pj if person Pj possesses the skill xi, for i = 1, ...n and j = 1, ...r. The goal is to as an employer find the minimum number of people to cover all the skills, namely to find the smallest set R ⊆ P such that the people in R covers all the skills: ∪P ∈RP = X (3) An example with 6 person and 12 skills will be illustrated. In the example, by picking a set containing P 1 , P 2 , P 4 , P 6 we could ensure that this set covers every skill. However, the set containing P 3 , P 4 , P 5 has the size 3, smaller than the former set with size 4.

Figure 1: an example of set covering problem in terms of the ordinary for- mulation

1.2 Hitting Set Formulation

The following formulation is equivalent:

P = {p 1... pr} (4) X = {X 1... Xn} (5)

where Xi represents a skill, pj a person, and pi ∈ Xj if person pj possesses the skill Xi, for i = 1, ...n and j = 1, ...r. The goal as well is to as an employer find the minimum number of people to cover all the skills, namely to find the smallest set H ⊆ P such that the people in H covers all the skills:

H = arcminH (|H|), s.t.|H ∩ Xi| 6 = 0, ∀i = 1, 2 , ...n (6)

1.3 A Solution to the Problem

As mentioned, this problem is NP-hard. However, we can obtain an O(logn)− approximation algorithm and it can be proved[1] that no better asymptotic approximation factor can be achieved unless P = N P.

The total cost C =

∑ x∈X Cx is the number of the people that are hired by this algorithm because for each stage 1 cost is charged over skills. Now we show that C is O(C∗logn). Firstly, we should have:

C =

∑

x∈X

Cx ≤

∑

S∈H∗

∑

x∈S

Cx (9)

This inequality holds because every skill is counted exactly once during the process above while in the optimal solution as a cover set, every skill will be counted at least more than once. Secondly, we bound the righthand side by two lemmas.

Lemma 1.3.1 : For all sets S belonging to P ,

∑

x∈S

Cx ≤ H(|S|) (10)

Proof of Lemma: Fix S ∈ P for all i = 1,... , |C|, and let ui = |Si − (S 1 ∪ S 2 ∪... ∪ Si− 1 )| be the number of elements in S remaining uncovered after S 1 ,... , Si have been selected by the algorithm, thus uo = |S|. Clearly ui− 1 > ui and ui− 1 − ui elements are covered for the first time by Si:

∑

x∈S

Cx =

∑^ k

i=

(ui− 1 − ui)

Si − (S 1 ∪ S 2 ∪... ∪ Si− 1 )

but we have

|Si − (S 1 ∪ S 2 ∪... ∪ Si− 1 )| ≥ |S − (S 1 ∪ S 2 ∪... ∪ Si− 1 )| = ui− 1 (12)

∑

x∈S

Cx ≤

∑^ k

i=

ui− 1 − ui ui− 1

∑^ k

i=

(H(ui− 1 ) − H(ui))

= H(u 0 ) − H(uk) ≤ H(|S|) (13)

where (14) is based on the following lemma from the calculus.

Lemma 1.3.2 Given two positive integers a and b, if a ≥ b,

H(b) − H(a) =

∑^ b i=a+

i

≥

b − a b

Thus, by applying the lemma:

C ≤

∑

S∈H∗

∑

x∈S

Cx

∑

S∈H∗

H(|S|)

= C∗^ × ln(|S|) (15)

The proof of the theorem is done.

2 Weighted Set Covering

Based on the hitting set formulation, we assign weights to the people, which can be considered as the salary. Denote the weight as wi for the person pi, where i = 1, 2 , ...r. Some people are hired more expensively while others are not. The goal is to find a subset ∑ H ⊆ P , but instead of minimizing the size, we minimize

i∈H wi.^ Next we formulate it to be an integer programming problem as follows.

2.1 Integer Programming Formulation for Weighted

Set Covering

For 1 ≤ i ≤ n, set the indicator variable Vi:

Vi =

{ 1 if pi ∈ H 0 otherwise

The goal is to minimize

∑n i=1 wi^ ·^ Vi, subject to^ ∀Xj^ ∈^ X,^

∑ pi∈Xj Vj^ ≥^ 1.

Now we give the analysis that this randomized algorithm is an optimal approximation algorithm with respect to n, namely O(logn)−approximation algorithm. First we can calculate the expected weight(cost) of the resulting partial cover is:

E[

∑^ n

i=

(Wi · Vˆi)] =

∑^ n

i=

(Wi · E[Vi])

∑^ n

i=

Wi · Vˆi

= OPT cost of LP ≤ OPT cost of IP (18)

Then we calculate the probability that Xi is covered. Suppose Xi con- tains p 1 ,... , pk. We know that:

∑^ k j=

V^ ˆj ≥ 1 (19)

Pr[Xi is covered] = 1 − Pr[not any is chosen] = Pr[p 1 isn’t chosen ∧ p 2 isn’t chosen,...

... ∧ pk isn’t chosen] = 1 − (1 − Vˆ 1 ) ·... · (1 − Vˆk)

= 1 −

∏^ k

j=

(1 − Vˆj )

≥ 1 − (1 − 1 /k)k^ ≥ 1 −

e

Next we show that if we repeat the round for 2logm times, with high probability all of the skills are covered, where m is the number of skills. Since we have proved for any skill, the probability that it is covered in one round is at least one half. Thus,

P rob[skill i is not covered after logm rounds] ≤ 1 −(

)2 lg^ m^ = 1−

m^2

Therefore, the probability that there exists some skill not covered after 2logm rounds:

P r[some skill not covered] ≤

∑

i∈S

Pr[i is not covered] ≤

∑

S∈S

m^2

m

With high probability, after 2logm rounds all the skills are covered. As calculated before, the weight for each round in expectation is at most OPT, where OPT is the optimal size for IP problem. In all, after 2logm rounds, the weight in expectation is at most OP T ∗ 2 logm, which proves that the algorithm is an O(logm) − approximation algorithm.

3 An Application for Set Covering

Motivation Due to the large size of the web users, it is extremely impor- tant to find an efficient and effective way to route data packages. During the routing, there are two targets we want to optimize: the first one is the length of headers which contains the information of the paths towards the destination in the packages; the second one is the size of the tables that map the pairs of source and destination to the shortest routing paths respect to the pairs. Next we introduce a graph spanner based method to reduce the storage of whole shortest paths (which is usually in the order of n^2 , where n is the number of nodes) to n^1 .5, with the guarantee that the shortest paths are degraded by a factor of 3. This graph spanners based method uses the set cover solution we elaborated before. Given an undirected graph G = (V, E), the goal is to construct a spannning graph G′^ = (V, E′) such that:

din G′ (u, v) ≤ 3 din G(u, v), ∀u, v ∈ V (23)

and the number of edges E′^ in G′^ is O(n^1.^5 log n) instead of O(n^2 ) The idea is that for each vertex v, we only store m closest neighbors; if we set m = n^0.^5 , then at least we should to store n · n^0.^5 = n^1.^5 vertices along the shortest paths. This forms the spanning graph G′. Then, we are going to define a set L of landmarks for the packages and routing tables, which should intuitively satisfy the following two conditions:

L is not too big to maintain;

References

[1] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, pages 634–652, July 1998.

[2] R. M. Karp. Reducibility Among Combinatorial Problems. In R. E. Miller and J. W. Thatcher, editors, Complexity of Computer Computa- tions, pages 85–103. Plenum Press, 1972.

Set Cover Problem: Unweighted and Weighted Formulations and Approximation Algorithms, Study notes of Advanced Algorithms

Related documents

Partial preview of the text

Download Set Cover Problem: Unweighted and Weighted Formulations and Approximation Algorithms and more Study notes Advanced Algorithms in PDF only on Docsity!

Lecture 8: Set Cover

1 Unweighted Set Covering

1.1 Ordinary Formulation

1.2 Hitting Set Formulation

1.3 A Solution to the Problem

H(|S|)

2 Weighted Set Covering

2.1 Integer Programming Formulation for Weighted

Set Covering

E[

3 An Application for Set Covering

References