The Selection Problem, Exercises of Elementary Mathematics

problem is to find the kth largest element in an unsorted array. โ—‹ Can solve in O(n log n) time by sorting and taking the kth largest element.

Typology: Exercises

2022/2023

Uploaded on 02/28/2023

mjforever
mjforever ๐Ÿ‡บ๐Ÿ‡ธ

4.8

(25)

254 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Selection Problem
โ—Recall from last time: the selection
problem is to find the kth largest element
in an unsorted array.
โ—Can solve in O(n log n) time by sorting
and taking the kth largest element.
โ—Can solve in O(n) time (with a large
constant factor) using the
โ€œmedian-of-mediansโ€ algorithm.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download The Selection Problem and more Exercises Elementary Mathematics in PDF only on Docsity!

The Selection Problem

โ— (^) Recall from last time: the selection problem is to find the k th largest element in an unsorted array. โ— (^) Can solve in O( n log n ) time by sorting and taking the k th largest element. โ— (^) Can solve in O( n ) time (with a large constant factor) using the โ€œmedian-of-mediansโ€ algorithm.

Comparison of Selection Algorithms

Array Size Sorting Median of Medians 10000000 0.92 0. 20000000 1.9 0. 30000000 2.9 1. 40000000 3.94 1. 50000000 5.01 1. 60000000 6.06 2. 70000000 7.16 2. 80000000 8.26 2. 90000000 9.3 3.

Randomized Selection

โ— (^) Silly question: What happens if you pick pivots completely at random? โ— (^) Intuitively, gives reasonably good probability of picking a good pivot. โ— (^) This algorithm is called quickselect.

Analyzing Quickselect

โ— (^) When analyzing a randomized algorithm, we typically are interested in learning the following: โ— (^) What is the average-case runtime of the function? โ— (^) How likely are we to achieve that average-case runtime? โ— (^) We'll answer these questions in a few minutes. โ— (^) For now, let's start off with a simpler question...

Triggering the Worst Case

โ— (^) Let ฦโ‚– be the event that we pick the largest or smallest element of the array when there are k elements left. โ— (^) Let event ฦ correspond to the worst-case runtime of quickselect occurring. โ— (^) We can then define ฦ as the event โ— (^) Question: What is P ( ฦ )? ฦ = (^) โˆฉ i = 1 n ฦi

Triggering the Worst Case

โ— (^) We have โ— (^) Since all ฦ i 's are independent (we make independent random choices at each level), this simplifies to โ— (^) If i > 1, then P ( ฦ i ) = 2 / i. P ( ฦ 1 ) = 1. Thus P ( ฦ ) = P ( โˆฉ i = 1 n ฦ i

P ( ฦ ) = (^) โˆ i = 1 n P ( ฦ i ) = (^) โˆ i = 2 n 2 i

n โˆ’ 1 n! P ( ฦ ) = P ( โˆฉ i = 1 n ฦ i ) = (^) โˆ i = 1 n P ( ฦ i

On Average

โ— (^) We know that the probability of getting a worst-case runtime is vanishingly small. โ— (^) But how does the algorithm do on average? Is it ฮ˜( n )? ฮ˜( n log n )? Something else? โ— (^) Totally reasonable thing to do: try running it and see what happens!

Comparison of Selection Algorithms

Array Size Sorting Median of Medians Quickselect 10000000 0.92 0.37 0. 20000000 1.9 0.74 0. 30000000 2.9 1.05 0. 40000000 3.94 1.43 0. 50000000 5.01 1.83 0. 60000000 6.06 2.12 0. 70000000 7.16 2.54 0. 80000000 8.26 2.89 1. 90000000 9.3 3.2 0.

An Accounting Trick

โ— (^) Because quickselect makes at most one recursive call, we can think of the algorithm as a chain of recursive calls: โ— (^) Accounting trick: group multiple calls together into one โ€œphaseโ€ of the algorithm. โ— (^) The sum of the work done by all calls is equal to the sum of the work done by all phases. โ— (^) Goal: Pick phases intelligently to simplify analysis.

Picking Phases

โ— (^) Let's define one โ€œphaseโ€ of the algorithm to be when the algorithm decreases the size of the input array to 75% of the original size or less. โ— (^) Why 75%? โ— (^) If array shrinks by any constant factor from phase to phase and only does linear work per phase, total work done is linear. โ— (^) The number 75% has a nice intuition...

Analyzing the Runtime

โ— (^) Number the phases 0, 1, 2, โ€ฆ โ— (^) In phase k , the array size is at most n (3 / 4) k. โ— (^) Last phase numbered at most โŒˆlog 4/ n โŒ‰. โ— (^) Let Xโ‚– be a random variable equal to the number of recursive calls in phase k. โ— (^) Work done in phase k is at most โ— (^) Let W be a random variable denoting the total work done. Then Xk โ‹… c n ( 3 4 ) k (for some constant c) W โ‰ค (^) โˆ‘ k = 0 โŒˆlog 4 / 3 n โŒ‰ ( X k โ‹… c n ( 3 4 ) k ) = c n (^) โˆ‘ k = 0 โŒˆlog 4 / 3 n โŒ‰ ( X k ( 3 4 ) k )

The Average-Case Analysis

โ— (^) Our goal is to determine the expected runtime for quickselect on an array of size n. โ— (^) This is E[ W ], the expected value of W. โ— (^) This is given by E[ W ] โ‰ค E

[

c n (^) โˆ‘ k = 0 โŒˆ log 4 / 3 n โŒ‰

X

k

k

)]

Simplifying Our Expression

E[ W ] โ‰ค E

[

c n (^) โˆ‘ k = 0 โŒˆlog 4 / 3 n โŒ‰

X

k

k

)]

= c n โ‹…E

[

โˆ‘ k = 0 โŒˆ log 4 / 3 n โŒ‰

X

k

k

)]

= c n โ‹… (^) โˆ‘ k = 0 โŒˆ log 4 / 3 n โŒ‰ E

[

X

k

k

]

= c n โ‹… (^) โˆ‘ k = 0 โŒˆ log 4 / 3 n โŒ‰ E[ X k

](

k

E[ Xโ‚– ]

โ— (^) By definition: Recall: Xโ‚– is the number of calls within phase k. โ— (^) Equivalently: The number of calls before a pivot is chosen in the middle 50% of the elements. โ— (^) Can we determine this explicitly?

E[ Xk ]=โˆ‘

i = 0 โˆž i โ‹… P ( Xk = i )