









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Paper; Class: Algorithms; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;
Typology: Papers
1 / 16
This page cannot be seen from the preview
Don't miss anything!










The point is, ladies and gentleman, greed is good. Greed works, greed is right. Greed clarifies, cuts through, and captures the essence of the evolutionary spirit. Greed in all its forms, greed for life, money, love, knowledge has marked the upward surge in mankind. And greed—mark my words—will save not only Teldar Paper but the other malfunctioning corporation called the USA. — Michael Douglas as Gordon Gekko, Wall Street (1987)
There is always an easy solution to every human problem— neat, plausible, and wrong. — H. L. Mencken, “The Divine Afflatus”, New York Evening Mail (November 16, 1917)
I love deadlines. I like the whooshing sound they make as they fly by. — Douglas Adams
Suppose we have a set of n files that we want to store on a tape. In the future, users will want to read those files from the tape. Reading a file from tape isn’t like reading from disk; first we have to fast-forward past all the other files, and that takes a significant amount of time. Let L [ 1 .. n ] be an array listing the lengths of each file; specifically, file i has length L [ i ]. If the files are stored in order from 1 to n , then the cost of accessing the k th file is
cost ( k ) =
∑^ k
i = 1
L [ i ].
The cost reflects the fact that before we read file k we must first scan past all the earlier files on the tape. If we assume for the moment that each file is equally likely to be accessed, then the expected cost of searching for a random file is
E[ cost ] =
∑^ n
k = 1
cost ( k ) n
∑^ n
k = 1
∑^ k
i = 1
L [ i ] n
If we change the order of the files on the tape, we change the cost of accessing the files; some files become more expensive to read, but others become cheaper. Different file orders are likely to result in different expected costs. Specifically, let π ( i ) denote the index of the file stored at position i on the tape. Then the expected cost of the permutation π is
E[ cost ( π )] =
∑^ n
k = 1
∑^ k
i = 1
L [ π ( i )] n
Which order should we use if we want the expected cost to be as small as possible? The answer is intuitively clear; we should store the files in order from shortest to longest. So let’s prove this.
Lemma 1. E[ cost ( π )] is minimized when L [ π ( i )] ≤ L [ π ( i + 1 )] for all i.
Proof: Suppose L [ π ( i )] > L [ π ( i + 1 )] for some i. To simplify notation, let a = π ( i ) and b = π ( i + 1 ). If we swap files a and b , then the cost of accessing a increases by L [ b ], and the cost of accessing b decreases by L [ a ]. Overall, the swap changes the expected cost by ( L [ b ] − L [ a ]) /n. But this change is an improvement, because L [ b ] < L [ a ]. Thus, if the files are out of order, we can improve the expected cost by swapping some mis-ordered adjacent pair. É
This example gives us our first greedy algorithm. To minimize the total expected cost of accessing the files, we put the file that is cheapest to access first, and then recursively write everything else; no backtracking, no dynamic programming, just make the best local choice and blindly plow ahead. If we use an efficient sorting algorithm, the running time is clearly O ( n log n ), plus the time required to actually write the files. To prove the greedy algorithm is actually correct, we simply prove that the output of any other algorithm can be improved by some sort of swap. Let’s generalize this idea further. Suppose we are also given an array f [ 1 .. n ] of access frequencies for each file; file i will be accessed exactly f [ i ] times over the lifetime of the tape. Now the total cost of accessing all the files on the tape is
Σ cost ( π ) =
∑^ n
k = 1
f [ π ( k )] ·
∑^ k
i = 1
L [ π ( i )]
∑^ n
k = 1
∑^ k
i = 1
f [ π ( k )] · L [ π ( i )]
Now what order should store the files if we want to minimize the total cost? We’ve already proved that if all the frequencies are equal, then we should sort the files by increasing size. If the frequencies are all different but the file lengths L [ i ] are all equal, then intuitively, we should sort the files by decreasing access frequency, with the most-accessed file first. In fact, this is not hard to prove by modifying the proof of Lemma 1. But what if the sizes and the frequencies are both different? In this case, we should sort the files by the ratio L/ f.
Lemma 2. Σ cost ( π ) is minimized when
L [ π ( i )] F [ π ( i )]
L [ π ( i + 1 )] F [ π ( i + 1 )]
for all i.
Proof: Suppose L [ π ( i )] /F [ π ( i )] > L [ π ( i + 1 )] /F [ π ( i + i )] for some i. To simplify notation, let a = π ( i ) and b = π ( i + 1 ). If we swap files a and b , then the cost of accessing a increases by L [ b ], and the cost of accessing b decreases by L [ a ]. Overall, the swap changes the total cost by L [ b ] F [ a ] − L [ a ] F [ a ]. But this change is an improvement, since
L [ a ] F [ a ]
L [ b ] F [ b ] =⇒ L [ b ] F [ a ] − L [ a ] F [ a ] < 0.
Thus, if the files are out of order, we can improve the total cost by swapping some mis-ordered adjacent pair. É
The next example is slightly less trivial. Suppose you decide to drop out of computer science at the last minute and change your major to Applied Chaos. The Applied Chaos department has all of its classes on the same day every week, referred to as “Soberday" by the students (but interestingly, not by the faculty). Every class has a different start time and a different ending time: AC 101 (‘Toilet Paper Landscape Architecture’) starts at 10:27pm and ends at 11:51pm; AC 666 (‘Immanentizing the Eschaton’) starts at 4:18pm and ends at 7:06pm, and so on. In the interests of graduating as quickly as possible, you want to register for as many classes as you can. (Applied Chaos classes don’t require any actual work .) The University’s registration computer won’t let you register for overlapping classes, and no one in the department knows how to override this ‘feature’. Which classes should you take? More formally, suppose you are given two arrays S [ 1 .. n ] and F [ 1 .. n ] listing the start and finish times of each class. Your task is to choose the largest possible subset X ∈ { 1 , 2 ,... , n } so that for any pair i , j ∈ X , either S [ i ] > F [ j ] or S [ j ] > F [ i ]. We can illustrate the problem by drawing each class as a rectangle whose left and right x -coordinates show the start and finish times. The goal is to find a largest subset of rectangles that do not overlap vertically.
This algorithm clearly runs in O ( n log n ) time. To prove that this algorithm actually gives us a maximal conflict-free schedule, we use an exchange argument, similar to the one we used for tape sorting. We are not claiming that the greedy schedule is the only maximal schedule; there could be others. (See the figures on the previous page.) All we can claim is that at least one of the maximal schedules is the one that the greedy algorithm produces.
Lemma 3. At least one maximal conflict-free schedule includes the class that finishes first.
Proof: Let f be the class that finishes first. Suppose we have a maximal conflict-free schedule X that does not include f. Let g be the first class in X to finish. Since f finishes before g does, f cannot conflict with any class in the set S \ { g }. Thus, the schedule X ′^ = X ∪ { f } \ { g } is also conflict-free. Since X ′^ has the same size as X , it is also maximal. É
To finish the proof, we call on our old friend, induction.
Theorem 4. The greedy schedule is an optimal schedule.
Proof: Let f be the class that finishes first, and let L be the subset of classes the start after f finishes. The previous lemma implies that some optimal schedule contains f , so the best schedule that contains f is an optimal schedule. The best schedule that includes f must contain an optimal schedule for the classes that do not conflict with f , that is, an optimal schedule for L. The greedy algorithm chooses f and then, by the inductive hypothesis, computes an optimal schedule of classes from L. É
The proof might be easier to understand if we unroll the induction slightly.
Proof: Let 〈 g 1 , g 2 ,... , gk 〉 be the sequence of classes chosen by the greedy algorithm. Suppose we have a maximal conflict-free schedule of the form
〈 g 1 , g 2 ,... , g (^) j − 1 , cj , cj + 1 ,... , cm 〉,
where the classes ci are different from the classes chosen by the greedy algorithm. By construction, the j th greedy choice g (^) j does not conflict with any earlier class g 1 , g 2 ,... , g (^) j − 1 , and since our schedule is conflict-free, neither does cj. Moreover, g (^) j has the earliest finish time among all classes that don’t conflict with the earlier classes; in particular, g (^) j finishes before cj. This implies that g (^) j does not conflict with any of the later classes cj + 1 ,... , cm. Thus, the schedule
〈 g 1 , g 2 ,... , g (^) j − 1 , g (^) j , cj + 1 ,... , cm 〉,
is conflict-free. (This is just a generalization of Lemma 3, which considers the case j = 1.) By induction, it now follows that there is an optimal schedule 〈 g 1 , g 2 ,... , gk , ck + 1 ,... , cm 〉 that includes every class chosen by the greedy algorithm. But this is impossible unless k = m ; if there were a class ck + 1 that does not conflict with gk , the greedy algorithm would choose more than k classes. É
The basic structure of this correctness proof is exactly the same as for the tape-sorting problem: an inductive exchange argument.
This argument implies by induction that there is an optimal solution that contains the entire greedy solution. Sometimes, as in the scheduling problem, an additional step is required to show no optimal solution strictly improves the greedy solution.
A binary code assigns a string of 0s and 1s to each character in the alphabet. A binary code is prefix-free if no code is a prefix of any other. 7-bit ASCII and Unicode’s UTF-8 are both prefix-free binary codes. Morse code is a binary code, but it is not prefix-free; for example, the code for S (· · ·) includes the code for E (·) as a prefix. Any prefix-free binary code can be visualized as a binary tree with the encoded characters stored at the leaves. The code word for any symbol is given by the path from the root to the corresponding leaf; 0 for left, 1 for right. The length of a codeword for a symbol is the depth of the corresponding leaf. (Note that the code tree is not a binary search tree. We don’t care at all about the sorted order of symbols at the leaves. (In fact. the symbols may not have a well-defined order!) Suppose we want to encode messages in an n -character alphabet so that the encoded message is as short as possible. Specifically, given an array frequency counts f [ 1 .. n ], we want to compute a prefix-free binary code that minimizes the total encoded length of the message:^2
∑^ n
i = 1
f [ i ] · depth ( i ).
In 1952, David Huffman developed the following greedy algorithm to produce such an optimal code:
HUFFMAN: Merge the two least frequent letters and recurse.
For example, suppose we want to encode the following helpfully self-descriptive sentence, discovered by Lee Sallows:^3
This sentence contains three a’s, three c’s, two d’s, twenty-six e’s, five f’s, three g’s, eight h’s, thirteen i’s, two l’s, sixteen n’s, nine o’s, six r’s, twenty-seven s’s, twenty-two t’s, two u’s, five v’s, eight w’s, four x’s, five y’s, and only one z.
To keep things simple, let’s forget about the forty-four spaces, nineteen apostrophes, nineteen commas, three hyphens, and one period, and just encode the letters. Here’s the frequency table:
A C D E F G H I L N O R S T U V W X Y Z 3 3 2 26 5 3 8 13 2 16 9 6 27 22 2 5 8 4 5 1
Huffman’s algorithm picks out the two least frequent letters, breaking ties arbitrarily—in this case, say, Z and D—and merges them together into a single new character DZ with frequency 3. This new character becomes an internal node in the code tree we are constructing, with Z and D as its children; it doesn’t matter which child is which. The algorithm then recursively constructs a Huffman code for the new frequency table
A C E F G H I L N O R S T U V W X Y DZ 3 3 26 5 3 8 13 2 16 9 6 27 22 2 5 8 4 5 3 (^2) This looks almost exactly like the cost of a binary search tree, but the optimization problem is very different: code trees are not search trees! (^3) A. K. Dewdney. Computer recreations. Scientific American, October 1984. Douglas Hofstadter published a few earlier examples of Lee Sallows’ self-descriptive sentences in his Scientific American column in January 1982.
Let T ′^ be the code tree obtained by swapping x and a. The depth of x increases by some amount ∆, and the depth of a decreases by the same amount. Thus,
cost ( T ′) = cost ( T ) − ( f [ a ] − f [ x ])∆.
By assumption, x is one of the two least frequent characters, but a is not, which implies that f [ a ] ≥ f [ x ]. Thus, swapping x and a does not increase the total cost of the code. Since T was an optimal code tree, swapping x and a does not decrease the cost, either. Thus, T ′^ is also an optimal code tree (and incidentally, f [ a ] actually equals f [ x ]). Similarly, swapping y and b must give yet another optimal code tree. In this final optimal code tree, x and y as maximum-depth siblings, as required. É
Now optimality is guaranteed by our dear friend the Recursion Fairy! Essentially we’re relying on the following recursive definition for a full binary tree: either a single node, or a full binary tree where some leaf has been replaced by an internal node with two leaf children.
Theorem 6. Huffman codes are optimal prefix-free binary codes.
Proof: If the message has only one or two different characters, the theorem is trivial. Otherwise, let f [ 1 .. n ] be the original input frequencies, where without loss of generality, f [ 1 ] and f [ 2 ] are the two smallest. To keep things simple, let f [ n + 1 ] = f [ 1 ] + f [ 2 ]. By the previous lemma, we know that some optimal code for f [1 .. n ] has characters 1 and 2 as siblings. Let T ′^ be the Huffman code tree for f [ 3 .. n + 1 ]; the inductive hypothesis implies that T ′^ is an optimal code tree for the smaller set of frequencies. To obtain the final code tree T , we replace the leaf labeled n + 1 with an internal node with two children, labelled 1 and 2. I claim that T is optimal for the original frequency array f [1 .. n ]. To prove this claim, we can express the cost of T in terms of the cost of T ′^ as follows. (In these equations, depth ( i ) denotes the depth of the leaf labelled i in either T or T ′; if the leaf appears in both T and T ′, it has the same depth in both trees.)
cost ( T ) =
∑^ n
i = 1
f [ i ] · depth ( i )
∑^ n +^1
i = 3
f [ i ] · depth ( i ) + f [ 1 ] · depth ( 1 ) + f [ 2 ] · depth ( 2 ) − f [ n + 1 ] · depth ( n + 1 )
= cost ( T ′) + f [ 1 ] · depth ( 1 ) + f [ 2 ] · depth ( 2 ) − f [ n + 1 ] · depth ( n + 1 ) = cost ( T ′) + ( f [ 1 ] + f [ 2 ]) · depth ( T ) − f [ n + 1 ] · ( depth ( T ) − 1 ) = cost ( T ′) + f [ 1 ] + f [ 2 ]
This equation implies that minimizing the cost of T is equivalent to minimizing the cost of T ′; in particular, attaching leaves labeled 1 and 2 to the leaf in T ′^ labeled n + 1 gives an optimal code tree for the original frequencies. É
To actually implement Huffman codes efficiently, we keep the characters in a min-heap, where the priority of each character is its frequency. We can construct the code tree by keeping three arrays of indices, listing the left and right children and the parent of each node. The root of the tree is the node with index 2 n − 1.
BUILDHUFFMAN( f [1 .. n ]): for i ← 1 to n L [ i ] ← 0; R [ i ] ← 0 INSERT( i , f [ i ]) for i ← n to 2 n − 1 x ← EXTRACTMIN( ) y ← EXTRACTMIN( ) f [ i ] ← f [ x ] + f [ y ] L [ i ] ← x ; R [ i ] ← y P [ x ] ← i ; P [ y ] ← i INSERT( i , f [ i ]) P [ 2 n − 1 ] ← 0
The algorithm performs O ( n ) min-heap operations. If we use a balanced binary tree as the heap, each operation requires O (log n ) time, so the total running time of BUILDHUFFMAN is O ( n log n ). Finally, here are simple algorithms to encode and decode messages:
HUFFMANENCODE( A [1 .. k ]): m ← 1 for i ← 1 to k HUFFMANENCODEONE( A [ i ]) HUFFMANENCODEONE( x ): if x < 2 n − 1 HUFFMANENCODEONE( P [ x ]) if x = L [ P [ x ]] B [ m ] ← 0 else B [ m ] ← 1 m ← m + 1
HUFFMANDECODE( B [1 .. m ]): k ← 1 v ← 2 n − 1 for i ← 1 to m if B [ i ] = 0 v ← L [ v ] else v ← R [ v ] if L [ v ] = 0 A [ k ] ← v k ← k + 1 v ← 2 n − 1
Many problems that can be correctly solved by greedy algorithms can be described in terms of abstract combinatorial objects called matroids. Matroids were first described in 1935 by the mathematician Hassler Whitney as a combinatorial generalization of linear independence of vectors.
called a basis if it is not a proper subset of another independent set. The exchange property implies that every basis of a matroid has the same cardinality. The rank of a subset X of the ground set is the size
(surprise, surprise). Finally, a dependent set is called a circuit if every proper subset is independent. Most of this terminology is justified by Whitney’s original example:
element could be added to G to get a larger independent set, the greedy algorithm would have added it. Thus, G is a basis. For purposes of deriving a contradiction, suppose there is an independent set H = { h 1 , h 2 ,... , h` } such that ∑ k
i = 1
w ( gi ) <
j = 1
w ( hi ).
Without loss of generality, we assume that H is a basis. The exchange property now implies that k = `. Now suppose the elements of G and H are indexed in order of decreasing weight. Let i be the smallest index such that w ( gi ) < w ( hi ), and consider the independent sets
Gi − 1 = { g 1 , g 2 ,... , gi − 1 } and Hi = { h 1 , h 2 ,... , hi − 1 , hi }.
By the exchange property, there is some element hj ∈ Hi such that Gi − 1 ∪ { hj } is an independent set. We have w ( hj ) ≥ w ( hi ) > w ( gi ). Thus, the greedy algorithm considers and rejects the heavier element hj before it considers the lighter element gi. But this is impossible—the greedy algorithm accepts elements in decreasing order of weight. É
We now immediately have a correct greedy optimization algorithm for any matroid. Returning to our examples:
The exchange condition for matroids turns out to be crucial for the success of this algorithm. A subset
With these weights, the greedy algorithm will consider and accept every element of Y , then consider and reject every element of X , and finally consider all the other elements. The algorithm returns a set with total weight m ( m + 2 ) = m^2 + 2 m. But the total weight of X is at least ( m + 1 )^2 = m^2 + 2 m + 1.
Recall the Applied Chaos scheduling problem considered earlier in this lecture. There is a natural subset system associated with this problem: A set of classes is independent if and only if not two classes overlap. (This is just the graph-theory notion of ‘independent set’!) This subset system is not a matroid, because there can be maximal independent sets of different sizes, which violates the exchange property. If we consider a weighted version of the class scheduling problem, say where each class is worth a different number of hours, Theorem 8 implies that the greedy algorithm will not always find the optimal schedule. (In fact, there’s an easy counterexample with only two classes!) However, Theorem 8 does not contradict the correctness of the greedy algorithm for the original unweighted problem, however; that problem uses a particularly lucky choice of weights (all equal).
Suppose you have n tasks to complete in n days; each task requires your attention for a full day. Each task comes with a deadline , the last day by which the job should be completed, and a penalty that you must pay if you do not complete each task by its assigned deadline. What order should you perform your tasks in to minimize the total penalty you must pay? More formally, you are given an array D [ 1 .. n ] of deadlines an array P [ 1 .. n ] of penalties. Each deadline D [ i ] is an integer between 1 and n , and each penalty P [ i ] is a non-negative real number. A schedule is a permutation of the integers { 1 , 2 ,... , n }. The scheduling problem asks you to find a schedule π that minimizes the following cost:
cost ( π ) :=
∑^ n
i = 1
P [ i ] · [ π ( i ) > D [ i ]].
This doesn’t look anything like a matroid optimization problem. For one thing, matroid optimization problems ask us to find an optimal set ; this problem asks us to find an optimal permutation. Surprisingly, however, this scheduling problem is actually a matroid optimization in disguise! For any schedule π , call tasks i such that π ( i ) > D [ i ] late , and all other tasks on time. The following trivial observation is the key to revealing the underlying matroid structure.
The cost of a schedule is determined by the subset of tasks that are on time.
Call a subset X of the tasks realistic if there is a schedule π in which every task in X is on time. We can precisely characterize the realistic subsets as follows. Let X ( t ) denote the subset of tasks in X whose deadline is on or before t : X ( t ) := { i ∈ X | D [ i ] ≤ t }.
In particular, X ( 0 ) = ∅ and X ( n ) = X.
Lemma 9. Let X ⊆ { 1 , 2 ,... , n } be an arbitrary subset of the n tasks. X is realistic if and only if | X ( t )| ≤ t for every integer t.
Proof: Let π be a schedule in which every task in X is on time. Let it be the t th task in X to be completed. On the one hand, we have π ( it ) ≥ t , since otherwise, we could not have completed t − 1
If we use this subroutine, GREEDYSCHEDULE runs in O ( n^2 ) time. By using some appropriate data structures, the running time can be reduced to O ( n log n ); details are left as an exercise for the reader.
A set of intervals. The seven shaded intervals form a tiling path.
A set of intervals stabbed by four points (shown here as vertical segments)
1 2 5
1
4 4
2
5
3 4
1
3
5
3 3 2 A proper coloring of a set of intervals using five colors.
(b) Describe and analyze a greedy algorithm whose output is within 1 of optimal. That is, if m is the minimum number of rays required to hit every balloon, then your greedy algorithm must output either m or m + 1. (Of course, you must prove this fact.) (c) Describe an algorithm that solves the minimum zap problem in O ( n^2 ) time. ? (d) Describe an algorithm that solves the minimum zap problem in O ( n log n ) time.
Assume you have a subroutine INTERSECTS( r , c ) that determines whether a ray r intersects a circle! c in O ( 1 ) time. It’s not that hard to write this subroutine, but it’s not the interesting part of the problem.
© c Copyright 2008 Jeff Erickson. Released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (http://creativecommons.org/licenses/by-nc-sa/3.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/ for the most recent revision.