


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The problem of median-finding and selection in an unsorted array. It explores the time complexity of sorting algorithms like Mergesort and Heapsort and introduces the concept of randomized quickselect. the algorithm and its correctness and running time. It also discusses a deterministic algorithm for the same problem. a detailed analysis of the time complexity of both algorithms and their expected running time.
Typology: Exams
1 / 4
This page cannot be seen from the preview
Don't miss anything!



601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Linear time selection/median Date: 9/9/
We saw last lecture a way to sort in time O(n log n): Randomized Quicksort. There are also other sorting algorithms with similar time bounds, most notably Mergesort and Heapsort (you should all know both of these already). In this lecture we will discuss a related problem with some surprisingly efficient algorithms: median-finding, or more generally, selection.
The median problem is the following: given an unsorted array, find and return the median element. In other words, given an array of length n, find and return the (n/2)nd smallest element. The selection problem is only slightly more general: given an array of length n and a value k ≤ n, find and return the kth smallest element. From now on we’ll mostly talk about selection.
It is obvious that selection can be done in time O(n log n): we can sort the array (using, e.g., mergesort), and then return the kth smallest element. Can we do any better?
It turns out that the answer is yes! We can do selection in O(n) time, both randomized (worst-case expected time) and deterministic.
There are a few easy cases, which we can do to warm up. For example, suppose k = 1. Then we are trying to find the smallest element, which we can do by simply scanning the array in O(n) time and keeping track of the smallest. Similarly, if k = n a simple scan also suffices. In general, this strategy works whenever k = O(1) or k = n − O(1), since we can just keep track of the k smallest/largest elements we see while we do a scan.
This doesn’t work for k = n/2, though. If we kept track of the k smallest elements, then when considering a new element in the scan we would have to figure out its place in the smallest k, which takes time Θ(log k) = Θ(log n) (upper bound via binary search, lower bound something we’ll see next week). So the total time would be Θ(n log k) = Θ(n log n).
The idea here is to use randomized quicksort, but instead of recursing on both sides we only recurse on the side which has the desired element. Slightly more formally, suppose we are given an array A of length n and an integer k ≤ n. Then Randomized Quickselect does the following:
(b) if |L| > k − 1 then return Quickselect(L, k). (c) If |L| < k − 1 then return Quickselect(G, k − |L| − 1).
Easy to argue correctness by arguing inductively that on every call to Quickselect(X, a), the original element we were looking for (the k’th smallest of A) must be the a’th smallest of X (do at home!). To argue running time, first note that the same intuition from quicksort continues to hold. We expect that our pivot splits the array approximately in half. This means that after O(log n) iterations we will find the element we are looking for. This might seem like it would give a bound of n log n, but in each iteration the number of comparisons we make also goes down by a factor of (approximately) 2, and thus the total number of comparisons is only O(n).
Let’s make this a little more formal. Let T (n) be the expected running time of Quickselect on an array of length n. As with quicksort, splitting the array around a pivot takes n − 1 comparisons. Each possible split is equally likely, i.e. |L| is uniformly distributed between 0 and n − 1 (and same with |G|). Note that T (n) ≤ T (n + 1) for all n. So whether we recurse in G or L depends on k and on the split, but since we are trying to provide an upper bound we can assume that we recurse on whichever has more elements (since that will make our algorithm take longer).
Thus we can write the following recurrence relation:
T (n) ≤ (n − 1) +
n∑− 1
i=
n
max(T (i), T (n − i − 1))
≤ (n − 1) +
n/ ∑ 2 − 1
i=
n T (n − i − 1) +
n∑− 1
i=n/ 2
n T (i) = (n − 1) +
n
n∑− 1
i=n/ 2
T (i)
Now let’s use our guess-and-check method, with the guess T (n) ≤ 4 n.
T (n) ≤ (n − 1) +
n
n∑− 1
i=n/ 2
4 i = (n − 1) + 4 ·
n
n∑− 1
i=n/ 2
i
= (n − 1) + 4 ·
n
n∑− 1
i=
i −
n/ ∑ 2 − 1
i=
i
= (n − 1) + 4 ·
n
n(n − 1) 2
(n/2)(n/ 2 − 1) 2
≤ (n − 1) + 4 ·
(n − 1) − n/ 2 − 1 2
≤ (n − 1) + 4
3 n 4
≤ 4 n.
What if we want a deterministic algorithm? Somewhat amazingly, this turns out to be possible. The basic idea is to deterministically find a pivot that will result in a more-or-less even split, and
step 4 takes time at most 7n/10 (by Lemma 4.3.1). So the total running time is
T (n) ≤ T (7n/10) + T (n/5) + cn.
It’s a good exercise to draw out the recursion tree to see what’s going on, but we can also solve by guess-and-check. Then we will guess that T (n) ≤ 10 cn. When we check this, we get that
T (n) ≤ 10 c(7n/10) + 10c(n/5) + cn = 9cn + cn = 10cn.
We can now use Quickselect to get a deterministic version of Quicksort which only uses O(n log n) comparisons in the worst case (recall that traditional Quicksort uses Θ(n^2 ) comparisons in the worst case, while randomized Quicksort uses O(n log n) in expectation). The algorithm is simple: when deciding on a pivot, use Quickselect to find the median, and then use that as a pivot. Clearly this splits the input in half, so the total number of comparisons is
T (n) = 2T (n/2) + cn = O(n log n),
where the cn term is the number of comparisons used for Quickselect plus the number used to split the array on the pivot.