Download Linear Time Sorting - Introduction to Algorithms - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
Algorithms
Linear-Time Sorting Continued
Medians and Order Statistics
Review: Comparison Sorts
● Comparison sorts: O(n lg n) at best
■ Model sort with decision tree
■ Path down tree = execution trace of algorithm
■ Leaves of tree = possible permutations of input
■ Tree must have n! leaves, so O(n lg n) height
Review: Counting Sort
1 CountingSort(A, B, k)
2 for i=1 to k
3 C[i]= 0;
4 for j=1 to n
5 C[A[j]] += 1;
6 for i=2 to k
7 C[i] = C[i] + C[i-1];
8 for j=n downto 1
9 B[C[A[j]]] = A[j];
10 C[A[j]] -= 1;
Review: Radix Sort
● How did IBM get rich originally?
● Answer: punched card readers for census
tabulation in early 1900’s.
■ In particular, a card sorter that could sort cards
into different bins
○ Each column can be punched in 12 places
○ Decimal digits use 10 places
■ Problem: only one column can be sorted on at a
time
Radix Sort
● Can we prove it will work?
● Sketch of an inductive argument (induction on
the number of passes):
■ Assume lower-order digits {j: j<i}are sorted
■ Show that sorting next digit i leaves array correctly
sorted
○ If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits irrelevant)
○ If they are the same, numbers are already sorted on the
lower-order digits. Since we use a stable sort, the
numbers stay in the right order
Radix Sort
● What sort will we use to sort on digits?
● Counting sort is obvious choice:
■ Sort n numbers on digits that range from 1.. k
■ Time: O( n + k )
● Each pass over n numbers with d digits takes
time O( n+k ), so total time O( dn+dk )
■ When d is constant and k= O( n ), takes O( n ) time
● How many bits in a computer word?
Radix Sort
● In general, radix sort based on counting sort is
■ Fast
■ Asymptotically fast (i.e., O( n ))
■ Simple to code
■ A good choice
● To think about: Can radix sort be used on
floating-point numbers?
Summary: Radix Sort
● Radix sort:
■ Assumption: input has d digits ranging from 0 to k
■ Basic idea:
○ Sort elements by digit starting with least significant
○ Use a stable sort (like counting sort) for each stage
■ Each pass over n numbers with d digits takes time
O( n+k ), so total time O( dn+dk )
○ When d is constant and k= O( n ), takes O( n ) time
■ Fast! Stable! Simple!
■ Doesn’t sort in place
Order Statistics
● The i th order statistic in a set of n elements is
the i th smallest element
● The minimum is thus the 1st order statistic
● The maximum is (duh) the n th order statistic
● The median is the n /2 order statistic
■ If n is even, there are 2 medians
● How can we calculate order statistics?
● What is the running time?
Order Statistics
● How many comparisons are needed to find the
minimum element in a set? The maximum?
● Can we find the minimum and maximum with
less than twice the cost?
● Yes:
■ Walk through elements by pairs
○ Compare each element in pair to the other
○ Compare the largest to maximum, smallest to minimum
■ Total cost: 3 comparisons per 2 elements =
O(3n/2)
Randomized Selection
● Key idea: use partition() from quicksort
■ But, only need to examine one subarray
■ This savings shows up in running time: O(n)
● We will again use a slightly different partition
than the book:
q = RandomizedPartition(A, p, r)
≤ A[q] ≥ A[q]
p q r
Randomized Selection
RandomizedSelect(A, p, r, i)
if (p == r) then return A[p];
q = RandomizedPartition(A, p, r)
k = q - p + 1; if (i == k) then return A[q]; // not in book
if (i < k) then
return RandomizedSelect(A, p, q-1, i);
else
return RandomizedSelect(A, q+1, r, i-k);
≤ A[q] ≥ A[q]
k
p q r
Randomized Selection
● Average case
■ For upper bound, assume i th element always falls
in larger side of partition:
■ Let’s show that T( n ) = O( n ) by substitution
∑^ ( )^ ( )
∑
−
=
−
=
1
/ 2
1
0
max , 1
n
k n
n
k
T k n n
T k n k n n
T n
What happened here?
What happened here?“Split” the recurrence
What happened here?
What happened here?
What happened here?
Randomized Selection
● Assume T( n ) ≤ cn for sufficiently large c :
( ) ( ) n
c n c n
n
n n n n n
c
k k n n
c
ck n n
T k n n
T n
n
k
n
k
n
k n
n
k n
+^ Θ
∑ ∑
∑
∑
−
=
−
=
−
=
−
=
2 1
1
1
1
1
/ 2
1
/ 2
The recurrence we started with
Substitute T(n) ≤ cn for T(k)
Expand arithmetic series
Multiply it out