Download Medians and Order Statistics - Introduction to Algorithms - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
Algorithms
Medians and Order Statistics
Structures for Dynamic Sets
Homework 3
● On the web shortly…
■ Due Wednesday at the beginning of class (test)
Review: Bucket Sort
● Bucket sort
■ Assumption: input is n reals from [0, 1)
■ Basic idea:
○ Create n linked lists ( buckets ) to divide interval [0,1) into subintervals of size 1/ n ○ Add each input element to appropriate bucket and sort buckets with insertion sort
■ Uniform input distribution O(1) bucket size
○ Therefore the expected total time is O(n)
■ These ideas will return when we study hash tables
Review: Order Statistics
● The i th order statistic in a set of n elements is
the i th smallest element
● The minimum is thus the 1st order statistic
● The maximum is (duh) the n th order statistic
● The median is the n /2 order statistic
■ If n is even, there are 2 medians
● Could calculate order statistics by sorting
■ Time: O(n lg n) w/ comparison sort
■ We can do better
Review: Randomized Selection
● Key idea: use partition() from quicksort
■ But, only need to examine one subarray
■ This savings shows up in running time: O(n)
≤ A[q] ≥ A[q] p q r
Review: Randomized Selection
RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k);
≤ A[q] ≥ A[q]
k
p q r
Worst-Case Linear-Time Selection
● Randomized algorithm works well in practice
● What follows is a worst-case linear time
algorithm, really of theoretical interest only
● Basic idea:
■ Generate a good partitioning element
■ Call this element x
Worst-Case Linear-Time Selection
● The algorithm in words:
- Divide n elements into groups of 5
- Find median of each group ( How? How long? )
- Use Select() recursively to find median x of the n/5 medians
- Partition the n elements around x. Let k = rank( x )
- if (i == k) then return x if (i < k) then use Select() recursively to find i th smallest element in first partition else (i > k) use Select() recursively to find ( i-k )th smallest element in last partition
Worst-Case Linear-Time Selection
● Thus after partitioning around x , step 5 will
call Select() on at most 3 n /4 elements
● The recurrence is therefore:
if is big enough
20
19 20 ( )
5 3 4 ( )
5 3 4
( ) 5 3 4
cn c
cn cn n
cn n
cn cn n
T n T n n
T n T n T n n
≤
= − − Θ
= + Θ
≤ + + Θ
≤ + + Θ
≤ + + Θ
???
???
??? ???
???
n/5 ≤ n/
Substitute T(n) = cn
Combine fractions Express in desired form
What we set out to prove
Worst-Case Linear-Time Selection
● Intuitively:
■ Work at each level is a constant fraction (19/20)
smaller
○ Geometric progression!
■ Thus the O(n) work at the root dominates
Linear-Time Median Selection
● Worst-case O(n lg n) quicksort
■ Find median x and partition around it
■ Recursively quicksort two halves
■ T(n) = 2T(n/2) + O(n) = O(n lg n)
Structures…
● Done with sorting and order statistics for now
● Ahead of schedule, so…
● Next part of class will focus on data structures
● We will get a couple in before the first exam
■ Yes, these will be on this exam
Binary Search Trees
● Binary Search Trees (BSTs) are an important
data structure for dynamic sets
● In addition to satellite data, eleements have:
■ key : an identifying field inducing a total ordering
■ left : pointer to a left child (may be NULL)
■ right : pointer to a right child (may be NULL)
■ p : pointer to a parent node (NULL for root)
Binary Search Trees
● BST property:
key[left(x)] ≤ key[x] ≤ key[right(x)]
● Example:
F
B H
A D K