




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Fundamental Algorithms; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2005;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





The maximum flow/minimum cut problem is a special case of a very general class of problems called linear programming. Many other optimization problems fall into this class, including minimum spanning trees and shortest paths, as well as several common problems in scheduling, logistics, and economics. Linear programming was used implicitly by Fourier in the early 1800s, but it was first formalized and applied to problems in economics in the 1930s by Leonid Kantorovich. Kantorivich’s work was hidden behind the Iron Curtain (where it was largely ignored) and therefore unknown in the West. Linear programming was rediscovered and applied to shipping problems in the early 1940s by Tjalling Koopmans. The first complete algorithm to solve linear programming problems, called the simplex method, was published by George Dantzig in 1947. Koopmans first proposed the name “linear programming” in a discussion with Dantzig in 1948. Kantorovich and Koopmans shared the 1975 Nobel Prize in Economics “for their contributions to the theory of optimum allocation of resources”. Dantzig did not; his work was apparently too pure. Koopmans wrote to Kantorovich suggesting that they refuse the prize in protest of Dantzig’s exclusion, but Kantorovich saw the prize as a vindication of his use of mathematics in economics, which had been written off as “a means for apologists of capitalism”. A linear programming problem asks for a vector x ∈ IRd^ that maximizes (or equivalently, minimizes) a given linear function, among all vectors x that satisfy a given set of linear inequalities. The general form of a linear programming problem is the following:
maximize
∑^ d
j=
cj xj
subject to
∑^ d
j=
aij xj ≤ bi for each i = 1 .. p
∑^ d
j=
aij xj = bi for each i = p + 1 .. p + q
∑^ d
j=
aij xj ≥ bi for each i = p + q + 1 .. n
Here, the input consists of a matrix A = (aij ) ∈ IRn×d, a column vector b ∈ IRn, and a row vector c ∈ IRd. Each coordinate of the vector x is called a variable. Each of the linear inequalities is called a constraint. The function x 7 → x · b is called the objective function. I will always use d to denote the number of variables, also known as the dimension of the problem. The number of constraints is usually denoted n. A linear programming problem is said to be in canonical form^1 if it has the following structure:
maximize
∑^ d
j=
cj xj
subject to
∑^ d
j=
aij xj ≤ bi for each i = 1 .. n
xj ≥ 0 for each j = 1 .. d (^1) Confusingly, some authors call this standard form.
We can express this canonical form more compactly as follows. For two vectors x = (x 1 , x 2 ,... , xd) and y = (y 1 , y 2 ,... , yd), the expression x ≥ y means that xi and yi for every index i.
max c · x s.t. Ax ≤ b x ≥ 0
Any linear programming problem can be converted into canonical form as follows:
j aij^ xj^ =^ bi^ with two inequality constraints^
j aij^ xj^ ≥^ bi and
j aij^ xj^ ≤^ bi.
j aij^ xj^ ≥^ bi^ with the equivalent lower bound^
j −aij^ xj^ ≤ −bi.
This conversion potentially triples the number of variables and doubles the number of constraints; fortunately, it is almost never necessary in practice. Another convenient formulation, especially for describing the simplex algorithm, is slack form^2 , in which the only inequalities are of the form xj ≥ 0:
max c · x s.t. Ax = b x ≥ 0
It’s fairly easy to convert any linear programming problem into slack form. This form will be especially useful in describing the simplex algorithm.
A point x ∈ IRd^ is feasible with respect to some linear programming problem if it satisfies all the linear constraints. The set of all feasible points is called the feasible region for that linear program. The feasible region has a particularly nice geometric structure that lands some useful intuition to later linear programming algorithms. Any linear equation in d variables defines a hyperplane in IRd; think of a line when d = 2, or a plane when d = 3. This hyperplane divides IRd^ into two halfspaces; each halfspace is the set of points that satisfy some linear inequality. Thus, the set of feasible points is the intersection of several hyperplanes (one for each equality constraint) and halfspaces (one for each inequality constraint). The intersection of a finite number of hyperplanes and halfspaces is called a polyhedron. It’s not hard to verify that any halfspace, and therefore any polyhedron, is convex —if a polyhedron contains two points x and y, then it contains the entire line segment xy. (^2) Confusingly, some authors call this standard form.
We can compute the length of the shortest path from s to t in a weighted directed graph by solving the following very simple linear programming problem.
maximize dt subject to ds = 0 dv − du ≤ `u→v for every edge u → v
Here, wu→v is the length of the edge u → v. Each variable dv represents a tentative shortest-path distance from s to v. The constraints mirror the requirement that every edge in the graph must be relaxed. These relaxation constraints imply that in any feasible solution, dv is at most the shortest path distance from s to v. Thus, somewhat counterintuitively, we are correctly maximizing the objective function to compute the shortest path! In the optimal solution, the objective function dt is the actual shortest-path distance from s to t, but for any vertex v that is not on the shortest path from s to t, dv may be an underestimate of the true distance from s to v. However, we can obtain the true distances from s to every other vertex by modifying the objective function:
maximize
v
dv
subject to ds = 0 dv − du ≤ `u→v for every edge u → v
There is another formulation of shortest paths as an LP minimization problem using indicator variables.
minimize
u→v
`u→v · xu→v
subject to
u
xu→s −
w
xs→w = 1 ∑
u
xu→t −
w
xt→w = − 1 ∑
u
xu→v −
w
xv→w = 0 for every vertex v 6 = s, t
xu→v ≥ 0 for every edge u → v
Intuitively, xu→v equals 1 if u → v is in the shortest path from s to t, and equals 0 otherwise. The constraints merely state that the path should start at s, end at t, and either pass through or avoid every other vertex v. Any path from s to t—in particular, the shortest path—clearly implies a feasible point for this linear program, but there are other feasible solutions with non-integral values that do not represent paths. Nevertheless, there is an optimal solution in which every xe is either 0 or 1 and the edges e with xe = 1 comprise the shortest path. Moreover, in any optimal solution, the objective function gives the shortest path distance, even if not every xe is an integer!
Recall that the input to the maximum (s, t)-flow problem consists of a weighted directed graph G = (V, E), two special vertices s and t, and a function assigning a non-negative capacity ce to
each edge e. Our task is to choose the flow fe across each edge e, as follows:
maximize
w
fs→w −
u
fu→s
subject to
w
fv→w −
u
fu→v = 0 for every vertex v 6 = s, t
fu→v ≤ cu→v for every edge u → v fu→v ≥ 0 for every edge u → v Similarly, the minimum cut problem can be formulated using ‘indicator’ variables similarly to the shortest path problem. We have a variable Sv for each vertex v, indicating whether v ∈ S or v ∈ T , and a variable Xu→v for each edge u → v, indicating whether u ∈ S and v ∈ T , where (S, T ) is some (s, t)-cut.^3
minimize
u→v
cu→v · Xu→v
subject to Xu→v + Sv − Su ≥ 0 for every edge u → v Xu→v ≥ 0 for every edge u → v Ss = 1 St = 0
Like the minimization LP for shortest paths, there can be optimal solutions that assign fractional values to the variables. Nevertheless, the minimum value for the objective function is the cost of the minimum cut, and there is an optimal solution for which every variable is either 0 or 1, representing an actual minimum cut. No, this is not obvious; in particular, my claim is not a proof!
Each of these pairs of linear programming problems is related by a transformation called duality. For any linear programming problem, there is a corresponding dual linear program that can be obtained by a mechanical translation, essentially by swapping the constraints and the variables. The translation is simplest when the LP is in canonical form:
Primal (Π) max c · x s.t. Ax ≤ b x ≥ 0
Dual (q) min y · b s.t. yA ≥ c y ≥ 0
We can also write the dual linear program in exactly the same canonical form as the primal, by swapping the coefficient vector c and the objective vector b, negating both vectors, and replacing the constraint matrix A with its negative transpose.^4
Primal (Π) max c · x s.t. Ax ≤ b x ≥ 0
Dual (q) max −b>^ · y> s.t. −A>y>^ ≤ −c y>^ ≥ 0
(^3) These two linear programs are not quite syntactic duals; I’ve added two redundant variables Ss and St to the min-cut program increase readability. (^4) For the notational purists: In these formulations, x and b are column vectors, and y and c are row vectors. This is a somewhat nonstandard choice. Yes, that means the dot in c · x is redundant. Sue me.
Because each yi is non-negative, we do not reverse any of the inequalities. Any feasible solution (x 1 , x 2 , x 3 ) must satisfy both of these inequalities, so it must also satisfy their sum:
(y 1 + 3y 2 )x 1 + (4y 1 − y 2 )x 2 + y 2 x 3 ≤ y 1 + 3y 2.
Now suppose that each yi is larger than the ith coefficient of the objective function:
y 1 + 3y 2 ≥ 4 , 4 y 1 − y 2 ≥ 1 , y 2 ≥ 3.
This assumption lets us derive an upper bound on the objective value of any feasible solution:
4 x 1 + x 2 + 3x 3 ≤ (y 1 + 3y 2 )x 1 + (4y 1 − y 2 )x 2 + y 2 x 3 ≤ y 1 + 3y 2.
We have just proved that σ∗^ ≤ y 1 + 3y 2. Now it’s natural to ask how tight we can make this upper bound. How small can we make the expression y 1 + 3y 2 without violating any of the inequalities we used to prove the upper bound? This is just another linear programming problem!
minimize y 1 + 3y 2 subject to y 1 + 3y 2 ≥ 4 4 y 1 − y 2 ≥ 1 y 2 ≥ 3 y 1 , y 2 ≥ 0
This is precisely the dual of our original linear program!
The Fundamental Theorem can be rephrased in the following form:
Strong Duality Theorem. If x∗^ is an optimal solution for a canonical linear program Π, then there is an optimal solution y∗^ for its dual q, such that c · x∗^ = y∗Ax∗^ = y∗^ · b.
Proof (Sketch): I’ll prove the theorem only for non-degenerate linear programs, in which (a) the optimal solution (if one exists) is a unique vertex of the feasible region, and (b) at most d constraint planes pass through any point. These non-degeneracy assumptions are relatively easy to enforce in practice and can be removed from the proof at the expense of some technical detail. I will also prove the theorem only for the case n ≥ d; the argument for under-constrained LPs is similar (if not simpler). Let x∗^ be the optimal solution for the linear program Π; non-degeneracy implies that this solution is unique, and that exactly d of the n linear constraints are satisfied with equality. Without loss of generality (by permuting the rows of A), we can assume that these are the first d constraints. So let A• be the d × d matrix containing the first d rows of A, and let A◦ denote the other n − d rows. Similarly, partition b into its first d coordinates b• and everything else b◦. Thus, we have partitioned the inequality Ax∗^ ≤ b into a system of equations A• x∗^ = b• and a system of strict inequalities A◦ x∗^ < b◦. Now let y∗^ = (y•∗ , y◦∗ ) where y•∗ = cA− • 1 and y◦∗ = 0. We easily verify that y∗^ · b = c · x∗:
y∗^ · b = y∗• · b• = (cA− • 1 )b• = c(A− • 1 b•) = c · x∗.
Similarly, it’s trivial to verify that y∗A ≥ c:
y∗A = y∗• A∗• = c.
Once we prove that y∗^ is non-negative, and therefore feasible, the Weak Duality Theorem implies the result. Clearly y◦∗ ≥ 0. As we will see below, the inequality y∗• ≥ 0 follows from the fact that x∗^ is optimal—we had to use that fact somewhere! This is the hardest part of the proof. The key insight is to give a geometric interpretation to the vector y•∗ = cA− • 1. Each row of the linear system A• x∗^ = b• describes a hyperplane ai · c∗^ = bi in IRd. The vector ai is normal to this hyperplane and points out of the feasible region. The vectors a 1 ,... , ad are linearly independent (by non-degeneracy) and thus describe a coordinate frame for the vector space IRd. The definition of y∗• can be rewritten as follows:
c = y∗• A• =
∑^ d
i=
y i∗ ai.
We are expressing the objective vector c as a linear combination of the constraint normals a 1 ,... , ad. Now consider any vertex z of the feasible region that is adjacent to x∗. The vector z − x∗^ is normal to all but one of the vectors ai. Thus, we have
A•(z − x∗) = (0,... , ν,... , 0)>
where the constant ν is in the ith coordinate. The vector z − x∗^ points into the feasible region, so ν ≤ 0. It follows that c · (z − x∗) = y•∗ A•(z − x∗) = νy∗ i.
The optimality of x∗^ implies that c · x∗^ ≥ c · z, so we must have y∗ i ≥ 0. We’re done!