















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
problem is to find the kth largest element in an unsorted array. โ Can solve in O(n log n) time by sorting and taking the kth largest element.
Typology: Exercises
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















โ (^) Recall from last time: the selection problem is to find the k th largest element in an unsorted array. โ (^) Can solve in O( n log n ) time by sorting and taking the k th largest element. โ (^) Can solve in O( n ) time (with a large constant factor) using the โmedian-of-mediansโ algorithm.
Array Size Sorting Median of Medians 10000000 0.92 0. 20000000 1.9 0. 30000000 2.9 1. 40000000 3.94 1. 50000000 5.01 1. 60000000 6.06 2. 70000000 7.16 2. 80000000 8.26 2. 90000000 9.3 3.
โ (^) Silly question: What happens if you pick pivots completely at random? โ (^) Intuitively, gives reasonably good probability of picking a good pivot. โ (^) This algorithm is called quickselect.
โ (^) When analyzing a randomized algorithm, we typically are interested in learning the following: โ (^) What is the average-case runtime of the function? โ (^) How likely are we to achieve that average-case runtime? โ (^) We'll answer these questions in a few minutes. โ (^) For now, let's start off with a simpler question...
โ (^) Let ฦโ be the event that we pick the largest or smallest element of the array when there are k elements left. โ (^) Let event ฦ correspond to the worst-case runtime of quickselect occurring. โ (^) We can then define ฦ as the event โ (^) Question: What is P ( ฦ )? ฦ = (^) โฉ i = 1 n ฦi
โ (^) We have โ (^) Since all ฦ i 's are independent (we make independent random choices at each level), this simplifies to โ (^) If i > 1, then P ( ฦ i ) = 2 / i. P ( ฦ 1 ) = 1. Thus P ( ฦ ) = P ( โฉ i = 1 n ฦ i
P ( ฦ ) = (^) โ i = 1 n P ( ฦ i ) = (^) โ i = 2 n 2 i
n โ 1 n! P ( ฦ ) = P ( โฉ i = 1 n ฦ i ) = (^) โ i = 1 n P ( ฦ i
โ (^) We know that the probability of getting a worst-case runtime is vanishingly small. โ (^) But how does the algorithm do on average? Is it ฮ( n )? ฮ( n log n )? Something else? โ (^) Totally reasonable thing to do: try running it and see what happens!
Array Size Sorting Median of Medians Quickselect 10000000 0.92 0.37 0. 20000000 1.9 0.74 0. 30000000 2.9 1.05 0. 40000000 3.94 1.43 0. 50000000 5.01 1.83 0. 60000000 6.06 2.12 0. 70000000 7.16 2.54 0. 80000000 8.26 2.89 1. 90000000 9.3 3.2 0.
โ (^) Because quickselect makes at most one recursive call, we can think of the algorithm as a chain of recursive calls: โ (^) Accounting trick: group multiple calls together into one โphaseโ of the algorithm. โ (^) The sum of the work done by all calls is equal to the sum of the work done by all phases. โ (^) Goal: Pick phases intelligently to simplify analysis.
โ (^) Let's define one โphaseโ of the algorithm to be when the algorithm decreases the size of the input array to 75% of the original size or less. โ (^) Why 75%? โ (^) If array shrinks by any constant factor from phase to phase and only does linear work per phase, total work done is linear. โ (^) The number 75% has a nice intuition...
โ (^) Number the phases 0, 1, 2, โฆ โ (^) In phase k , the array size is at most n (3 / 4) k. โ (^) Last phase numbered at most โlog 4/ n โ. โ (^) Let Xโ be a random variable equal to the number of recursive calls in phase k. โ (^) Work done in phase k is at most โ (^) Let W be a random variable denoting the total work done. Then Xk โ c n ( 3 4 ) k (for some constant c) W โค (^) โ k = 0 โlog 4 / 3 n โ ( X k โ c n ( 3 4 ) k ) = c n (^) โ k = 0 โlog 4 / 3 n โ ( X k ( 3 4 ) k )
โ (^) Our goal is to determine the expected runtime for quickselect on an array of size n. โ (^) This is E[ W ], the expected value of W. โ (^) This is given by E[ W ] โค E
c n (^) โ k = 0 โ log 4 / 3 n โ
k
k
c n (^) โ k = 0 โlog 4 / 3 n โ
k
k
= c n โ E
โ k = 0 โ log 4 / 3 n โ
k
k
= c n โ (^) โ k = 0 โ log 4 / 3 n โ E
k
k
= c n โ (^) โ k = 0 โ log 4 / 3 n โ E[ X k
k
โ (^) By definition: Recall: Xโ is the number of calls within phase k. โ (^) Equivalently: The number of calls before a pivot is chosen in the middle 50% of the elements. โ (^) Can we determine this explicitly?
i = 0 โ i โ P ( Xk = i )