




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These lecture notes cover the Spring 2016 course on Algorithms at Carnegie Mellon University. The notes cover the basics of algorithms, including how to design and analyze them. The course covers important techniques such as Dynamic Programming, Divide-and-Conquer, Hashing, and other Data Structures, Randomization, Network Flows, and Linear Programming. The notes also delve into Complexity Theory, focusing on the somewhat surprising notion of NP-completeness. The notes could be useful for university students studying computer science or a related field.
Typology: Lecture notes
1 / 202
This page cannot be seen from the preview
Don't miss anything!





























































































15-451/651: Design & Analysis of Algorithms January 11, 2016 Lecture #1: Introduction, and Median Finding last changed: January 11, 2016
The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that we do, and why we emphasize proving guarantees. We also go through an example of a problem that is easy to relate to, that of finding the median of a set of n elements. This is a problem for which there is a simple O(n log n) time algorithm, but we can do better, using randomization, and also a clever deterministic construction. These illustrate some of the ideas and tools we will be using (and building upon) in this course.
Material in this lecture:
This course is about the design and analysis of algorithms — how to design correct, efficient algorithms, and how to think clearly about analyzing correctness and running time.
What is an algorithm? At its most basic, an algorithm is a method for solving a computational problem. A recipe. Along with an algorithm comes a specification that says what the algorithm’s guarantees are. For example, we might be able to say that our algorithm indeed correctly solves the problem in question and runs in time at most f (n) on any input of size n. This course is about the whole package: the design of efficient algorithms, and proving that they meet desired specifications. For each of these parts, we will examine important techniques that have been developed, and with practice we will build up our ability to think clearly about the key issues that arise.
The main goal of this course is to provide the intellectual tools for designing and analyzing your own algorithms for problems you need to solve in the future. Some tools we will discuss are Dynamic Programming, Divide-and-Conquer, Hashing and other Data Structures, Randomization, Network Flows, and Linear Programming. Some analytical tools we will discuss and use are Recurrences, Probabilistic Analysis, Amortized Analysis, and Potential Functions.
There is also a dual to algorithm design: Complexity Theory. Complexity Theory looks at the intrinsic difficulty of computational problems — what kinds of specifications can we expect not to be able to achieve? In this course, we will delve a bit into complexity theory, focusing on the somewhat surprising notion of NP-completeness. We will additionally discuss some approaches for dealing with NP-complete problems, including the notion of approximation algorithms.
Another goal will be to discuss models that go beyond the traditional input-output model. In the traditional model we consider the algorithm to be given the entire input in advance and it just has to perform the computation and give the output. This model is great when it applies, but it is
One thing that makes algorithm design “Computer Science” is that solving a problem in the most obvious way from its definitions is often not the best way to get a solution. A simple example of this is median finding.
Recall the median. For a set of n elements, this is the element in this set that is the n/ 2 th^ smallest, i.e., it has n/2 elements larger than it.^1 Given an unsorted array, how quickly can one find the median element? The definition gives us no clue: we can enumerate over all elements, and for each check if it is the median. This gives a Θ(n^2 ) time algorithm. Or one can sort, and then read off the median, which takes O(n log n) time using MergeSort or HeapSort (deterministic) or QuickSort (randomized).
Can one do it more quickly than by sorting? In this lecture we describe two linear-time algorithms for this problem: one randomized and one deterministic. More generally, we solve the problem of finding the kth smallest out of an unsorted array of n elements.
Consider the problem of finding the kth smallest element in an unsorted array of size n. (Let’s say all elements are distinct to avoid the question of what we mean by the kth smallest when we have equalities). One way to solve this problem is to sort and then output the kth element. We can do this in time O(n log n) if we sort using MergeSort, QuickSort, or HeapSort. Is there something faster – a linear-time algorithm? The answer is yes. We will explore both a simple randomized solution and a more complicated deterministic one.
The idea for the randomized algorithm is to start with the Randomized QuickSort algorithm (choose a random element as “pivot”, partition the array into two sets Less and Greater consisting of those elements less than and greater than the pivot respectively, and then recursively sort Less and Greater). Then notice that there is a simple speedup we can make if we just need to find the kth smallest element. In particular, after the partitioning step we can tell which of Less or Greater has the item we are looking for, just by looking at their sizes. So, we only need to recursively examine one of them, not both. For instance, if we are looking for the 87th-smallest element in our array, and suppose that after choosing the pivot and partitioning we find that Less has 200 elements, then we just need to find the 87th smallest element in Less. On the other hand, if we find Less has 40 elements, then we just need to find the 87 − 40 − 1 = 46th smallest element in Greater. (And if Less has size exactly 86 then we can just return the pivot). One might at first think that allowing the algorithm to only recurse on one subset rather than both would just cut down time by a factor of 2. However, since this is occuring recursively, it compounds the savings and we end up with Θ(n) rather than Θ(n log n) time. This algorithm is often called Randomized-Select, or QuickSelect.
(^1) We are deliberately ignoring what happens if n is odd, you can — and indeed, should (and will have to) — make this precise when you code it up, but for now it will be easier not to worry about this, since the ideas we get here can be all made perfectly precise.
QuickSelect: Given array A of size n and integer 1 ≤ k ≤ n,
Theorem 1 The expected number of comparisons for QuickSelect is O(n).
Before giving a formal proof, let’s first get some intuition. If we split a candy bar at random into two pieces, then the expected size of the larger piece is 3/4 of the bar. If the size of the larger subarray after our partition was always 3/4 of the array, then we would have a recurrence T (n) ≤ (n − 1) + T (3n/4) which solves to T (n) < 4 n. Now, this is not quite the case for our algorithm because 3n/4 is only the expected size of the larger piece. That is, if i is the size of the larger piece, our expected cost to go is really E[T (i)] rather than T (E[i]). However, because the answer is linear in n, the average of the T (i)’s turns out to be the same as T (average of the i’s). Let’s now see this a bit more formally.
Proof (Theorem 1): Let T (n, k) denote the expected time to find the kth smallest in an array of size n, and let T (n) = maxk T (n, k). We will show that T (n) < 4 n.
First of all, it takes n − 1 comparisons to split into the array into two pieces in Step 2. These pieces are equally likely to have size 0 and n − 1, or 1 and n − 2, or 2 and n − 3, and so on up to n − 1 and 0. The piece we recurse on will depend on k, but since we are only giving an upper bound, we can imagine that we always recurse on the larger piece. Therefore we have:
T (n) ≤ (n − 1) +
n
n∑− 1
i=n/ 2
T (i)
= (n − 1) + avg [T (n/2),... , T (n − 1)].
We can solve this using the “guess and check” method based on our intuition above. Assume inductively that T (i) ≤ 4 i for i < n. Then,
T (n) ≤ (n − 1) + avg [4(n/2), 4(n/2 + 1),... , 4(n − 1)] ≤ (n − 1) + 4(3n/4) < 4 n,
and we have verified our guess.
What about a deterministic linear-time algorithm? For a long time it was thought this was impos- sible, and that there was no method faster than first sorting the array. In the process of trying to prove this formally, it was discovered that this thinking was incorrect, and in 1972 a deterministic
answer is linear in n? One way to do that is to consider the “stack of bricks” view of the recursion tree discussed in the notes for Recitation #1.
In particular, let’s build the recursion tree for the recurrence (1), making each node as wide as the quantity inside it:
cn cn/ 5 7 cn/ (^10) Total: 9cn/ 10
Total: cn
Total: 81cn/ 100
Notice that even if this stack-of-bricks continues downward forever, the total sum is at most
cn(1 + (9/10) + (9/10)^2 + (9/10)^3 +.. .),
which is at most 10cn. This proves the theorem.
Notice that in our analysis of the recurrence (1) the key property we used was that n/5+7n/ 10 < n. More generally, we see here that if we have a problem of size n that we can solve by performing recursive calls on pieces whose total size is at most (1 − )n for some constant > 0 (plus some additional O(n) work), then the total time spent will be just linear in n. This gives us a nice extension to our “Master theorem” from the notes to Recitation #1.
Theorem 3 For constants c and a 1 ,... , ak such that a 1 +... ak < 1 , the recurrence
T (n) ≤ T (a 1 n) + T (a 2 n) +... T (akn) + cn
solves to T (n) = O(n).
Exercise 1: Show that for constants c and a 1 ,... , ak such that a 1 +... ak = 1 and each ai < 1, the recurrence T (n) ≤ T (a 1 n) + T (a 2 n) +... T (ak n) + cn solves to T (n) = O(n log n). Show that this is best possible by observing that T (n) = T (n/2)+T (n/2)+ n solves to T (n) = Θ(n log n).
Exercise 2: What happens if we split the elements into n/3 groups of size 3? Or n/k groups of size k for larger odd values of k?
15-451/651: Design & Analysis of Algorithms January 13, 2016 Lecture #2 last changed: January 13, 2016
In this lecture, we will examine some simple, concrete models of computation, each with a precise definition of what counts as a step, and try to get tight upper and lower bounds for a number of problems. Specific models and problems examined in this lecture include:
In this lecture, we will look at (worst-case) upper and lower bounds for a number of problems in several different concrete models. Each model will specify exactly what operations may be performed on the input, and how much they cost. Typically, each model will have some operations that cost 1 step (like performing a comparison, or swapping a pair of elements), some that are free, and some that are not allowed at all.
By an upper bound of f (n) for some problem, we mean that there exists an algorithm that takes at most f (n) steps on any input of size n. By a lower bound of g(n), we mean that for any algorithm there exists an input on which it takes at least g(n) steps. The reason for this terminology is that if we think of our goal as being to understand the “true complexity” of each problem, measured in terms of the best possible worst-case guarantee achievable by any algorithm, then an upper bound of f (n) and lower bound of g(n) means that the true complexity is somewhere between g(n) and f (n).
One natural model for examining problems like sorting is what is known as the comparison model.
Definition 1 In the comparison model, we have an input consisting of n items (typically in some initial order). An algorithm may compare two items (asking is ai > aj ?) at a cost of 1. Moving the items around is free. No other operations on the items are allowed (such as using them as indices, XORing them, etc).
For the problem of sorting in the comparison model, the input is an array a = [a 1 , a 2 ,... , an] and the output is a permutation of the input π(a) = [aπ(1), aπ(2),... , aπ(n)] in which the elements are in increasing order. We begin this lecture by showing the following lower bound for comparison-based sorting.
Theorem 2 Any deterministic comparison-based sorting algorithm must perform at least lg(n!) comparisons to sort n elements in the worst case.^1 Specifically, for any deterministic comparison-
(^1) As is common in CS, we will use “lg” to mean “log 2 ”.
are considering the problem of sorting in the exchange model.
Definition 3 In the exchange model, an input consists of an array of n items, and the only operation allowed on the items is to swap a pair of them at a cost of 1 step. All other (planning) work is free: in particular, the items can be examined and compared to each other at no cost.
Question: how many exchanges are necessary (lower bound) and sufficient (upper bound) in the exchange model to sort an array of n items in the worst case?
Claim 4 (Upper bound) n − 1 exchanges is sufficient.
Proof: For this we just need to give an algorithm. For instance, consider the algorithm that in step 1 puts the smallest item in location 1, swapping it with whatever was originally there. Then in step 2 it swaps the second-smallest item with whatever is currently in location 2, and so on (if in step k, the kth-smallest item is already in the correct position then we just do a no-op). No step ever undoes any of the previous work, so after n − 1 steps, the first n − 1 items are in the correct position. This means the nth item must be in the correct position too.
But are n − 1 exchanges necessary in the worst-case? If n is even, and no book is in its correct location, then n/2 exchanges are clearly necessary to “touch” all books. But can we show a better lower bound than that?
Claim 5 (Lower bound) In fact, n − 1 exchanges are necessary, in the worst case.
Proof: Here is how we can see it. Create a graph in which a directed edge (i, j) means that that the book in location i must end up at location j. An example is given in Figure 1.
Figure 1: Graph for input [f c d e b a g]
Note that this is a special kind of directed graph: it is a permutation — a set of cycles. In particular, every book points to some location, perhaps its own location, and every location is pointed to by exactly one book. Now consider the following points:
Putting the above 3 points together, suppose we begin with an array consisting of a single cycle, such as [n, 1 , 2 , 3 , 4 ,... , n − 1]. Each operation at best increases the number of cycles by 1 and in the end we need to have n cycles. So, this input requires n − 1 operations.
How many comparisons are necessary and sufficient to find the maximum of n elements, in the comparison model of computation?
Claim 6 (Upper bound) n − 1 comparisons are sufficient to find the maximum of n elements.
Proof: Just scan left to right, keeping track of the largest element so far. This makes at most n − 1 comparisons.
Now, let’s try for a lower bound. One simple lower bound is that since there are n possible answers for the location of the maximum element, our previous argument gives a lower bound of lg n. But clearly this is not at all tight. Also, we have to look at all the elements (else the one not looked at may be larger than all the ones we look at). But looking at all n elements could be done using n/ 2 comparisons; not tight either. In fact, we can give a better lower bound of n − 1.
Claim 7 (Lower bound) n − 1 comparisons are needed in the worst-case to find the maximum of n elements.
Proof: Suppose some algorithm A claims to find the maximum of n elements using less than n − 1 comparisons. Consider an arbitrary input of n distinct elements, and construct a graph in which we join two elements by an edge if they are compared by A. If fewer than n − 1 comparisons are made, then this graph must have at least two components. Suppose now that algorithm A outputs some element u as the maximum, where u is in some component C 1. In that case, pick a different component C 2 and add a large positive number (e.g., the value of u) to every element in C 2. This process does not change the result of any comparison made by A, so on this new set of elements, algorithm A would still output u. Yet this now ensures that u is not the maximum, so A must be incorrect.
Since the upper and lower bounds are equal, the bound of n − 1 is tight.
Note that this argument was different from the “information theoretic” bound we used for sorting. Here we showed that if the algorithm makes “too few” comparisons on some input In and outputs out, we can give another input In′^ where the algorithms would do the same comparisons and receive the same answers to them, and hence also output out, but out is the incorrect output for input In′.
A slightly different lower bound argument comes from showing that if an algorithm makes “too few” comparisons, then an adversary can fool it into giving the incorrect answer. Here is a little example.
must have been directly compared to the best, and lost.^2 This means there are only lg n possibilities for the second-highest number, and we can find the maximum of them making only lg(n) − 1 more comparisons.
At this point, we have a lower bound of n − 1 and an upper bound of n + lg(n) − 2, so they are nearly tight. It turns out that, in fact, the lower bound can be improved to exactly meet the upper bound.^3
This material is optional; you may find it interesting.
To finish with something totally different, let’s look at the query complexity of determining if a graph is connected. Assume we are given the adjacency matrix G for some n-node graph. That is, G[i, j] = 1 if there is an edge between i and j, and G[i, j] = 0 otherwise. We consider a model in which we can query any element of the matrix G in 1 step. All other computation is free. That is, imagine the graph matrix has values written on little slips of paper, face down. In one step we can turn over any slip of paper. How many slips of paper do we need to turn over to tell if G is connected?
Claim 11 (Easy upper bound) n(n − 1)/ 2 queries are sufficient to determine if G is connected.
Proof: This just corresponds to querying every pair (i, j). Once we have done that, we know the entire graph and can just compute for free to see if it is connected.
Interestingly, it turns out the simple upper-bound of querying every edge is a lower bound too. Because of this, connectivity is called an “evasive” property of graphs.
Theorem 12 (Lower bound) n(n − 1)/ 2 queries are necessary to determine connectivity in the worst case.
Proof: Here is the strategy for the adversary: when the algorithm asks us to flip over a slip of paper, we return the answer 0 unless that would force the graph to be disconnected, in which case we answer 1. (It is not important to the argument, but we can figure this out by imagining that all un-turned slips of paper are 1 and seeing if that graph is connected.) Now, here is the key claim:
Claim: we maintain the invariant that for any un-asked pair (u, v), the graph revealed so far has no path from u to v. Proof of claim: If there was, consider the last edge (u′, v′) revealed on that path. We could have answered 0 for that and kept the same connectivity in the graph by having an edge (u, v). So, that contradicts the definition of our adversary strategy.
Now, to finish the proof: Suppose an algorithm halts without examining every pair. Consider some unasked pair (u, v). If the algorithm says “connected,” we reveal all-zeros for the remaining unasked edges and then there is no path from u to v (by the key claim) so the algorithm is wrong. If the algorithm says “disconnected,” we reveal all-ones for the remaining edges, and the algorithm is wrong by definition of our adversary strategy. So, the algorithm must ask for all edges.
(^2) Apparently the first person to have pointed this out was Charles Dodgson (better known as Lewis Carroll!), writing about the proper way to award prizes in lawn tennis tournaments. (^3) First shown by Kislitsyn (1964).
In this lecture we discuss a useful form of analysis, called amortized analysis, for problems in which one must perform a series of operations, and our goal is to analyze the time per operation. The motivation for amortized analysis is that looking at the worst-case time per operation can be too pessimistic if the only way to produce an expensive operation is to “set it up” with a large number of cheap operations beforehand.
We also introduce the notion of a potential function which can be a useful aid to performing this type of analysis. A potential function is much like a bank account: if we can take our cheap operations (those whose cost is less than our bound) and put our savings from them in a bank account, use our savings to pay for expensive operations (those whose cost is greater than our bound), and somehow guarantee that our account will never go negative, then we will have proven an amortized bound for our procedure.
As in the previous lecture, in this lecture we will avoid use of asymptotic notation as much as possible, and focus instead on concrete cost models and bounds.
So far we have been looking at static problems where you are given an input (like an array of n objects) and the goal is to produce an output with some desired property (e.g., the same objects, but sorted). For next few lectures, we’re going to turn to problems where we have a series of operations, and goal is to analyze the time taken per operation. For example, rather than being given a set of n items up front, we might have a series of n insert, lookup, and remove requests to some database, and we want these operations to be efficient.
Today, we will talk about a useful kind of analysis, called amortized analysis for problems of this sort. The definition of amortized cost is actually quite simple:
Definition 3.1 The amortized cost per operation for a sequence of n operations is the total cost of the operations divided by n.
For example, if we have 100 operations at cost 1, followed by one operation at cost 100, the
Here is another way to analyze the process of doubling the array in the above example. Say that every time we perform a push operation, we pay $1 to perform it, and we put $2 into a piggy bank. So, our out-of-pocket cost per push is $3. Any time we need to double the array, from size L to 2L, we pay for it using money in the bank. How do we know there will be enough money ($L) in the bank to pay for it? The reason is that after the last resizing, there were only L/2 elements in the array and so there must have been at least L/2 new pushes since then contributing $2 each. So, we can pay for everything by using an out-of-pocket cost of at most $3 per operation. Putting it another way, by spending $3 per operation, we were able to pay for all the operations plus possibly still have money left over in the bank. This means our amortized cost is at most 3.^1
This “piggy bank” method is often very useful for performing amortized analysis. The piggy bank is also called a potential function, since it is like potential energy that you can use later. The potential function is a guarantee on the amount of money in the bank. In the case above, the potential is twice the number of elements in the array after the midpoint. Note that it is very important in this analysis to prove that the bank account doesn’t go negative. Otherwise, if the bank account can slowly drift off to negative infinity, the whole proof breaks down.
Definition 3.2 A potential function is a function of the state of a system, that generally should be non-negative and start at 0, and is used to smooth out analysis of some algorithm or process.
Observation: If the potential is non-negative and starts at 0, and at each step the actual cost of our algorithm plus the change in potential is at most c, then after n steps our total cost is at most cn. That is just the same thing we were saying about the piggy bank: our total cost for the n operations is just our total out of pocket cost minus the amount in the bank at the end.
Sometimes one may need in an analysis to “seed” the bank account with some initial positive amount for everything to go through. In that case, the kind of statement one would show is that the total cost for n operations is at most cn plus the initial seed amount.
Recap: The motivation for amortized analysis is that a worst-case-per-operation analysis can give overly pessimistic bound if the only way of having an expensive operation is to have a lot of cheap ones before it. Note that this is different from our usual notion of “average case analysis”: we are not making any assumptions about the inputs being chosen at random, we are just averaging over time.
Imagine we want to store a big binary counter in an array A. All the entries start at 0 and at each step we will be simply incrementing the counter. Let’s say our cost model is: whenever we increment the counter, we pay $1 for every bit we need to flip. (So, think of the counter as an
(^1) In fact, if you think about it, we can pay for pop operations using money from the bank too, and even have $ left over. So as a more refined analysis, our amortized cost is $3 per push and $−1 per successful pop (a pop from a nonempty stack).
array of heavy stone tablets, each with a “0” on one side and a “1” on the other.) For instance, here is a trace of the first few operations and their cost:
A[m] A[m-1] ... A[3] A[2] A[1] A[0] cost 0 0 ... 0 0 0 0 $ 0 0 ... 0 0 0 1 $ 0 0 ... 0 0 1 0 $ 0 0 ... 0 0 1 1 $ 0 0 ... 0 1 0 0 $ 0 0 ... 0 1 0 1 $
In a sequence of n increments, the worst-case cost per increment is O(log n), since at worst we flip lg(n) + 1 bits. But, what is our amortized cost per increment? The answer is it is at most 2. Here are two proofs.
Proof 1: Every time you flip 0 → 1, pay the actual cost of $1, plus put $1 into a piggy bank. So the total amount spent is $2. In fact, think of each bit as having its own bank (so when you turn the stone tablet from 0 to 1, you put a $1 coin on top of it). Now, every time you flip a 1 → 0, use the money in the bank (or on top of the tablet) to pay for the flip. Clearly, by design, our bank account cannot go negative. The key point now is that even though different increments can have different numbers of 1 → 0 flips, each increment has exactly one 0 → 1 flip. So, we just pay $ (amortized) per increment.
Equivalently, what we are doing in this proof is using a potential function that is equal to the number of 1-bits in the current count. Notice how the bank-account/potential-function allows us to smooth out our payments, making the cost easier to analyze.
Proof 2: Here is another way to analyze the amortized cost. First, how often do we flip A[0]? Answer: every time. How often do we flip A[1]? Answer: every other time. How often do we flip A[2]? Answer: every 4th time, and so on. So, the total cost spent on flipping A[0] is n, the total cost spent flipping A[1] is at most n/2, the total cost flipping A[2] is at most n/4, etc. Summing these up, the total cost spent flipping all the positions in our n increments is at most 2n.
Imagine a version of the counter we just discussed in which it costs 2k^ to flip the bit A[k]. (Suspend disbelief for now — we’ll see shortly why this is interesting to consider). Now, in a sequence of n increments, a single increment could cost as much as n (actually 2n − 1), but the claim is the amortized cost is only O(log n) per increment. This is probably easiest to see by the method of “Proof 2” above: A[0] gets flipped every time for cost of $1 each (a total of $n). A[1] gets flipped
For instance, if we insert again, we just put the new item into A0 at cost 1. If we insert again, we merge the new array with A0 and put the result into A1 at a cost of 1 + 2.
Claim 3.1 The above data structure has amortized cost O(log n) per insert.
Proof: With the cost model defined above, it’s exactly the same as the binary counter with cost 2 k^ for counter k.
Notes on Amortization
D. Sleator