Analysis of QuickSort Algorithm: Time Complexity and Probability Bounds, Study notes of Algorithms and Programming

An analysis of the quicksort algorithm, discussing its time complexity, high probability bounds, and the importance of choosing a pivot. It covers the expected running time, the probability of the depth of the recursion tree being more than expected, and the use of a random permutation of keys for priority comparison.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-0vj
koofers-user-0vj 🇺🇸

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter I
Quick-Sort, Treaps, Skip Lists, etc
We revisit quicksort and study two related search data structures: treaps and skip lists. We
also consider the selection problem –finding the k-th smallest element. The main common
ingredient is the use of randomization. We will encounter many other applications of
randomization in the course.
I.1 Quick Sort
QuickSort uses a divide and conquer approach. Unlike in MergeSort, here the work is in
dividing and solving recursively; putting the result together is trivial.
QuickSort (A[1 . . .n]):
if (n > 1)
Choose a pivot element A[p]
kPartition(A, p)
QuickSort (A[1 . . .k 1])
QuickSort (A[k+ 1 . . . n])
The partition induced by the pivot simply involves comparing the pivot to each of the other
items. It can be implemented “in place” (without using extra space):
Partition (A[1 . . .n], p):
swap A[n]A[p]
i1; jn
while (i < j)
while (i < j and A[i]A[n]) ii+ 1
while (i < j and A[j]A[n]) jj1
if (i < j) swap A[i]A[j]
swap A[i]A[n]
return i
There are many ways to implement Partition.1They may differ in how they handle other
keys that are equal to the pivot. This is important for the claim that we will make about
1There two different ones in CLRS (one in the text and one in the exercises), and still another in Jeff’s
notes. In CLRS, they verify carefully its correctness. You may want to check that out.
I.1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Analysis of QuickSort Algorithm: Time Complexity and Probability Bounds and more Study notes Algorithms and Programming in PDF only on Docsity!

Chapter I

Quick-Sort, Treaps, Skip Lists, etc

We revisit quicksort and study two related search data structures: treaps and skip lists. We also consider the selection problem –finding the k-th smallest element. The main common ingredient is the use of randomization. We will encounter many other applications of randomization in the course.

I.1 Quick Sort

QuickSort uses a divide and conquer approach. Unlike in MergeSort, here the work is in dividing and solving recursively; putting the result together is trivial.

QuickSort (A[1... n]): if (n > 1) Choose a pivot element A[p] k ← Partition(A, p) QuickSort (A[1... k − 1]) QuickSort (A[k + 1... n])

The partition induced by the pivot simply involves comparing the pivot to each of the other items. It can be implemented “in place” (without using extra space):

Partition (A[1... n], p): swap A[n] ↔ A[p] i ← 1; j ← n while (i < j) while (i < j and A[i] ≤ A[n]) i ← i + 1 while (i < j and A[j] ≥ A[n]) j ← j − 1 if (i < j) swap A[i] ↔ A[j] swap A[i] ↔ A[n] return i

There are many ways to implement Partition.^1 They may differ in how they handle other keys that are equal to the pivot. This is important for the claim that we will make about

(^1) There two different ones in CLRS (one in the text and one in the exercises), and still another in Jeff’s

notes. In CLRS, they verify carefully its correctness. You may want to check that out.

the expected running time when the pivot is chosen randomly. If partition, for example, puts all the copies of the pivot in the same half, then the expected running time for an input where all the keys are equal will be Θ(n^2 ), as will follow from the analysis below. This problem can be removed by either (i) excluding the copies of the pivot from both halves, or (ii) enforcing non-equal keys by using their initial index in the array to differentiate them when they are equal. Below then we will be assuming that all keys are different.

Let us concentrate on the total number of comparisons performed as they dominate the total running time. Ideally, the pivot is close in rank (the position in the sorted order) to the median at every level of the recursion, and then the total number of comparisons is Θ(n log n): the computation tree has O(log n) levels, and in each at most n comparisons are performed. If the imbalance is extreme, say the subproblems have sizes 0 and n − 1 at every level, then the total number of comparisons is Θ(n^2 ).

n

n/

n/

n/4 n/4 n/

n/16 n/16 n/16 n/16 n/16 n/16 n/16 n/

n/

n

(^0) n−

0 n−

0 0 0

n− n− n−

Figure I.1: QuickSort recursion tree

If each of the two resulting subproblems has size at least αn where α is a positive fraction, then the depth is log 1 /(1−α) n which is still O(log n) and the total number of comparisons is O(n log n). For example, if both subproblems have size at least n/4.

n

n/4 3n/

n/16 (^) 3n/16 3n/16 9n/

n/64 3n/64 3n/64 9n/64 3n/64 9n/64 9n/64 27n/

Figure I.2: QuickSort: A good split

I.2 Randomized QuickSort

It is possible to choose the pivot carefully and “efficiently” so that each subproblem has size at least a fraction of the problem size. Say, using a (deterministic) linear time selection algorithm. But the algorithm is complicated and the multiplicative constants involved are large. A simple, fast and practical approach is to use randomization. Here the pivot is

with T (1) = 0. This can be simplified to

T (n) = (n − 1) +

n

∑^ n−^1

k=

T (k).

A possible approach is simply to verify using induction on n that T (n) ≤ Cn log n for some constant C > 0. The base case n = 1 is clear since T (1) = 0. Then

T (n) ≤ (n − 1) +

2 C

n

n∑− 1

k=

k log k

≤ (n − 1) +

2 C

n

n^2 log n 2

n^2 log e 4

= Cn log n + ((n − 1) − (C log e/2)n) ≤ Cn log n,

if C ≥ 2 / log e (we have used the bound

∑n− 1 k=1 k^ log^ k^ ≤^

n^2 log n 2 −^

n^2 log e 4 which can be obtained by upper bounding the sum by an integral).

Analysis II: Recurrence Equation 2 – Exact Solution

With some manipulation, we can solve the recurrence equation above exactly. First, mul- tiplying by n and substracting the same expression for n − 1, we obtain

nT (n) − (n − 1)T (n − 1) = (n − 1)n − (n − 2)(n − 1) + 2T (n − 1),

and so

T (n) = 2 −

n

n + 1 n

T (n − 1) = 4 − 2

n + 1 n

n + 1 n

T (n − 1).

We now substitute t(n) = T (n)/(n + 1) and obtain

t(n) =

n + 1

n

  • t(n − 1),

with t(0) = 0. So

t(n) = 4

∑^ n

i=

i + 1

∑^ n

i=

i = 4(Hn+1 − 1) − 2 Hn

= 2 Hn − 4 +

n + 1

Finally, T (n) = (n + 1)t(n) = 2(n + 1)Hn − 4(n + 1) + 4.

Analysis III: Indicator R.V.’s

Let Xij be a 0/1 random variable indicating whether xi as pivot is compared to xj. Then the expected number of comparisons performed by quicksort is

E
[

i 6 =j

Xi,j

]

i 6 =j

E[Xij ]

i 6 =j

Prob{Xij = 1}

where we have used linearity of expectation and E[X] = Prob{X = 1} for an indicator random variable (why ?). What is Prob{Xij } equal to? Let’s consider i < j.

Observation. xi is a pivot and is compared to xj iff xi is chosen as pivot before any of the items xi+1,... , xj.

Since any of the items xi,... , xj is equaly likely to be chosen first as pivot then

Prob{Xij = 1} =

(j − i) + 1

Similarly for i > j and so we can write for i 6 = j

Prob{Xij = 1} =

|j − i| + 1

Finally, the expected number of comparisons performed is

E
[

i 6 =j

Xi,j

]

i 6 =j

|j − i| + 1

∑^ n−^1

i=

∑^ n

j=i+

(j − i) + 1

∑^ n−^1

i=

n∑−i+

k=

k

∑^ n−^1

i=

2(Hn−i+1 − 1)

∑^ n−^1

i=

2 Hn−i+1 − 2(n − 1)

∑^ n−^1

i=

2 Hi+1 − 2(n − 1)

∑^ n−^1

i=

2(ln(i + 1) + 1) − 2(n − 1)

∑^ n−^1

i=

2 ln(i + 1)

≤ 2 n ln n,

where we have used Hk ≤ ln k + 1 (see [CLRS, p.1067]). The exact result should be equal to the result from Analysis II. Is it? Verify.

Theorem 1. Let X be a non-negative random variable and μX = E[X] be its expected value. For any t > 0 ,

Prob{X ≥ t} ≤

μX t

Proof.

E[X] =

x

x · Prob{X = x}

x<t

x · Prob{X = x} +

x≥t

x · Prob{X = x}

≥ t ·

x≥t

Prob{X = x}

= t · Prob{X ≥ t},

and so the claimed bound follows.

So applied to our analysis of QuickSort, the fact that the expected running time is at most Cn log n (for a constant C that results from the analysis) implies that for any t > 0, the probabilty that the running time is greater than tCn log n is at most 1/t. This is not a very strong claim. A more careful analysis below shows the following stronger claim.

Theorem 2. For any constant c > 0 , there is a constant C(c) such that the probability that the depth of QuickSort’s recursion tree is more than C(c) log n is at most 1 /nc. For any constant c′^ > 0 , there is a constant C′(c) such that the probability that the running time of QuickSort is more than C′(c)n log n is at most 1 /nc ′ .

Proof. Let us say that a split is good if both of the resulting subproblems have size at most 3 n/4; a split is bad otherwise. A path from the root in the recursion tree cannot have more then log 4 / 3 n good splits because by then the size of the resulting subproblem is at most

  1. A good split happens if the pivot has rank between n/4 and 3n/4, so the probability of a good split is 12 ; so the probability of a bad split is also 12. We want to verify that the probability of less than log 4 / 3 n good splits in a sequence of T = 4c log 4 / 3 n, c ≥ 1, splits is very small. Consider a sequence of T splits. The probability P (k) that at most k splits are good is exactly given by

P (k) =

2 T^

∑^ k

i=

T

i

Consider now k = λT and use the following approximation for the binomial coefficient (see CLRS, equation following (C.6)) :

( T λT

λ

)λ ( 1 1 − λ

) 1 −λ)T

to obtain

P (λT ) ≤ λT ·

λ

)λ ( 1 1 − λ

) 1 −λ)T

Setting λ = 1/4, we have

P (T /4) ≤
T
33 /^4
)T
T
· 0. 87738 · · ·T

For T ≥ T 0 , T ≤ ABT^ with T 0 = 1, A = 41, B = 41/40 (there is a lot of flexibility in choosing T 0 , A, B). So, for sufficiently large T ,

P (T /4) ≤
33 /^4
)T
)T

Consider a binary tree τ of depth T. We see the leaves of τ as representing the potential outcome of computation paths in the execution tree of QuickSort (though not all of these may be reached for a given computation). For i = 1,... , n, let i be the leaf of τ that coorresponds to the subproblem containing the i-th smallest key. If in the path toi there are at least log 4 / 3 n good splits, then i is actually not reached in the computation; therefore, the probability thati is reached is bounded by (note c ≥ 1)

Prob{`i is reached} ≤ Prob{# good splits < log 4 / 3 n} ≤ Prob{# good splits < c log 4 / 3 n}

) 4 c log 4 / 3 n

· n−^4 c^ log^4 /^3 (10/9)

=

· n−^1.^46496 ···c

≤ n−(c+1)

for n ≥ 20 , c ≥ 4 (so that n−^0.^46496 ···c^ ≤ 4 / 41 · n−^1 ). (For c < 4, note that 1/n^4 ≤ 1 /nc, so we can apply the result for c = 4).

So, the probability that any `i is reached, and hence that the depth of the recursion tree is larger than 8c log 4 / 3 n is at most n · n−(c+1)^ = n−c. So C(c) = 8c/ log(4/3) for c ≥ 4 and C(c) = 32/ log(4/3) for c < 4.

Finally, since in every level of the tree, the amount of work (time) is at most n, then we can conclude that the running time is more than C′(c′)n log n with probability at most 1/nc ′

where C() = C′().

Analysis VI: Easy and illuminating... but a bit lacking

Let us say that a pivot is good if both resulting subproblems have size at least n/4, otherwise a pivot is bad. The probability of a good pivot is 1/2 and the probability of a bad pivot is also 1/2. If the pivot is good, in the worst case the split is n/4 and 3n/4. On the other hand, if the pivot is bad, the number of comparisons is no worst than if the algorithm were started again (with n items). Thus

T (n) ≤

· (T (n/4) + T (3n/4)) +

· T (n) + (n − 1)

and so T (n) ≤ T (n/4) + T (3n/4) + 2(n − 1).

The time required by these operations depends on the depth in the tree of the query node, or its predecessor or successor. For arbitrary prioroties, this depth can be very large (even Ω(n)). But from our discussion on QuickSort, when priorities are random, we expect (our expectation) that the expected (over the random choice of priorities) depth is O(log n).

Treaps support search for a key, and insertion and deletion of a key. They are implemented as follows:

Search q: Just as in a binary search tree: Starting with the root, compare the search key q to the key in the current node, if smaller/larger then continue with the left/right child of that node. A successful search ends with the node storing q; an unsuccessful one ends with a null pointer from the predecessor q−^ or the successor q+^ of q among the keys stored in the treap.

Insert q: Search for q; if unsuccessful it must end in a null pointer, then insert q in a new node there and generate a priority for it. Through rotations, move q up until the heap property is restored (note that rotations preserve the key ordering and eventually will fix the heap order).

Delete q: This is the inverse procedure of insert. Rotate q down until it becomes a leaf and then remove.

α

β

T

T T

1 2 3 T^1

α

T 3 T 2

β

Figure I.6: Rotations preserve the key order and fix the heap order

Expected Depth of a Node. Though this analysis is essentially a repeat of the previous one for QuickSort, we are going to compute the expected depth of a node u in the treap. Let u contain the k-th smallest key xk. For any i 6 = k, let Ai,k be a 0/1 random variable that is 1 iff xi is an ancestor of xk in the treap. We have

E[depth(xk)] = E

[

i 6 =k

Ai,k

]

i 6 =k

Prob{Ai,k}.

Key xi is an ancestor of xk if the priority of i is highest than the priority of any other key in {xi, xi+1,... , xk− 1 , xk} if i < k or of any other key in {xk, xk+1,... , xi− 1 , xi} if k < i.

Since the priorities are chosen uniformly at random (or alternatively, they correspond to a random permutation), then this probability is

Prob{Ai,k} =

|k − i| + 1

So,

E[depth(xk)] =

i 6 =k

|k − i| + 1

∑^ k

j=

j

n−∑k+

j=

j

  • = Hk + Hn−k+1 − 2 = O(log n).

Weighted Treaps. We have items with associated (integer) weights that indicate how frequently they are accessed. Let wi indicate the weight of xi. The priority ri of xi is set to the minimum of wi independent random numbers in [0, 1]. We will show that the expected time to access xi is O(log(W/wi)) where W =

∑n i=1 wi^ (the heap order keeps the key with smallest priority at the root). We assume x 1 < x 2 < · · · < xn. The time to access xi is proportional to its depth in the tree, and this is proportional to the number of ancestors in the tree. Let Ai,j be the indicator random variable corresponding to the event “xj is an ancestor of xj ”. Then

depth(xi) =

j 6 =i

Ai,j

and so E[depth(xi)] =

j 6 =i

Prob{Ai,j }.

Let wi,j =

∑j k=i wk. Prob{Ai,j^ }^ is the probability that^ xj^ is chosen as pivot before any of the keys xi, xi+1,... , xj− 2 , xj− 1. This happens if one of the wj priorities of xj is smaller than among the wi,j priorities of xi,... , xj. Since any of the wi,j is equally likely to be smallest, then

Prob{Ai,j } =

wj wi,j

Using wj wi,j

wi,j

wi,j − 1

wi,j − 2

wi,j − wj + 1

we get ∑

j>i

Prob{Ai,j } =

j>i

wj wi,j

w ∑i,n

k=wi+

k

With a similar analysis for j < i, we get

j 6 =i

Prob{Ai,j } =

∑^ wi,n

k=wi+

k

∑^ w^1 ,i

k=wi+

k

Using Hn − Hm = O(log(n/m)), we obtain

E[depth(xi)] = O(log(wi,n/wi) + log(w 1 ,i/wi)) = O(log(W/wi)).

Analysis

The number of levels in a skip list is O(log n) with probability at most 1/nc: the probability that a given key has level larger than k is equal to 1/ 2 k, and so the probability that any key has level larger than k is at most n/ 2 k; with k = (c + 1) log n this probability is at most

n 2 (c+1)^ log n

nc^

This calculation also shows that the expected number of levels is O(log n):

E[# levels] =

i≥ 0

i · Prob{# levels = i}

i<(c+1) log n

i · Prob{# levels = i} +

i≥(c+1) log n

i · Prob{# levels = i}

≤ 2(c + 1) log n +

i≥(c+1) log n

Prob{# levels ≥ i}

≤ 2(c + 1) log n + n ·

i≥(c+1) log n

2 i

= O(log n).

So, what is the search cost? The search starts at the top level and descends until the key (or its predecessor or successor) is found. Intuitively, the time spent in following pointers in each level is O(1). We will just offer a simplified (but not completely accurrate) analysis: for this purpose it is convenient to analize the search process starting from the last key and following the link pointers in the reverse direction –so we have backward and upward pointers–; each key “tosses a coin” to decide if it remains in the next level, so at every step ths reversed-search follows backward pointer or an upward pointer with equal probability 1 /2; the expected number of backward pointers followed in each level is then 2; therefore the total expected search length is twice the number of levels in the skip list, which is O(log n).

Why is this not accurrate? In the last lines, we are basically applying linearity of expecta- tion to the sum of search lengths in the levels, but the number of levels itself is a random variable, and it is noot completely clear that this goes through.

I.5 Selection

The following randomized algorithm selects the k-th smallest element in an unsorted array A[1... n], 1 ≤ k ≤ n. Let us assume for simplicity of analysis that all the elements are different so that the k-th smallest is well-defined: the element with exactly k − 1 elements smaller than it (it does not really make much of a difference if there are duplicates though). Recall that Partition splits the array into two parts corresponding to the elements smaller and larger than the pivot, placed to the left and right of the pivot, whose position is returned by the function. Partition runs in linear time. Random (1, n) returns an integer in 1 , 2 ,... , n uniformly at random (each integer equally likely). The correctness should be clear.

RandomSelect (A[1... n], k)

  1. p ← Random(1, n)
  2. r ← Partition(A[1... n], p)
  3. if k < r then
  4. return (RandomSelect(A[1... r − 1], k))
  5. else if k > r then
  6. return (RandomSelect(A[r + 1... n], k − r))
  7. else
  8. return A[r]

If the pivot is selected arbitrarily, not uniformly at random, the worst case running time would be Θ(n^2 ): say k ≥ n/2 and the pivot is always the first element, then the size is reduced only by one in the recursive call, at least n/2 times.

I.5.1 Analysis

Analysis by Recurrence Relation

Let T (n) denote the expected running time. Since each A[r] has probability 1/n of being chosen as pivot, then

T (n) =

n

∑^ k−^1

r=

T (n − r) +

n

∑^ n

r=k+

T (r − 1) + C · n

since if A[r] is the pivot with r < k then we recurse with A[r + 1],... , A[n], and if A[r] is the pivot with r > k then we recurse with A[1],... , A[r − 1]. The term Cn accounts for the time of Partition. Using the recurrence, we verify that T (n) ≤ D · n by induction:

T (n) ≤

n

∑^ k−^1

r=

D · (n − r) +

n

∑^ n

r=k+

D · (r − 1) + C · n

D

n

(k − 1)n −

k(k − 1) 2

n(n − 1) 2

k(k + 1) 2

  • C · n
D

n

(k − 1)n +

n(n − 1) 2

− k^2

  • C · n
D

n

(kn − k^2 ) +

n^2 2

  • C · n
D

n

n^2 4

n^2 2

  • C · n
3 D

· n + C · n ≤ D · n

with D so that 3D/4 + C ≤ D, that is D ≥ 4 C.

Analysis by Indicator Variables

We count the expected number of comparisons performed by the algorithm. For this, we calculate the probability that during the complete execution of RandomSelect a

precisely, for simplicity suppose k = bn/ 2 c and let us suppose that ni+1 > (1 − β)ni occurs where β is a (non-constant) fraction to be determined. This happens if the pivot is selected from the first and last βni elements in the i-th subarray. So

Prob{ni+1 > (1 − β)ni} = 2β.

Suppose that this holds for consecutive steps, then in the i-th step, i ≤, ni ≥ (1 − β)in and so the running time until the `-th step is

T (n) =

∑^ `

i=

(1 − β)in =

1 − (1 − β)` β

n.

Choosing 1/β ≈ 2 , we get T (n) = Ω(n), since (1 − 1 / 2 )^ = Θ(1). The probability of this happening is P (n) ≥ β`^ = 1/``.

For example, let us choose = log n/ log log n. Then T (n) = Ω(n) and

P (n) ≥

(log n/ log log n)log^ n/^ log log^ n^

which is Θ(1/nc) for some c. So, it may be possible to prove (though we haven’t quite done it) that the running time is O(n log n/ log log n) with high probability (1 − 1 /nc). On the other hand, if = o(log n/ log log n), then the running time is Ω(n) with a probability that is ω(1/nc) (recall that f (n) = ω(g(n)) if f (n)/g(n) → ∞ as n → ∞), for any constant c. For example take ` =

log n. Then T (n) = Ω(n

log n) with probability at least

1 √ log n

√log n =

nlog log^ n/^2

√log n = ω

nc

for any constant c > 0.

“Fixing” Randomized Selection

Can we modify the randomized algorithm for selection so that linear running time with high probability is achieved? The problem appears to be the simple pivot choice; if we could achieve size reduction by a constant factor with high probability, then we would be able to achieve running time O(n) with high probability. A possible approach is to

(i) select a “large sample” –say of size C log n (C log n times choose one element uniformly at random)– then

(ii) find the median element in the sample by selecting recursively or simply by sorting the sample, and

(iii) use that median element as pivot to split and recurse.

Let’s compute the probability that in the i-th recursion step, with the splitter so obtained, ni+1 > (1−β)ni holds. This would hold if among the first βni elements more than C log ni/ 2

are taken into the sample; this happens with probability at most

C (^) ∑log ni

j=C log ni/2+

C log ni j

· βj^ ≤ (C log ni/2) ·

C log ni C log ni/ 2

· βC^ log^ ni/^2

≤ (C log ni/2) · (2e)C^ log^ ni/^2 · βC^ log^ ni/^2 ≤ (C log ni/2) · (2eβ)C^ log^ ni/^2 ≤ (C log ni/2) · nC i^ log(2eβ)/^2

(we have used

(n m

≤ (en/m)m). There is a similar bound if among the last βni elements, more than C log ni/2 are taken into the sample. If we set β = 1/ 4 e and C = 8, then we get probability at most 8 log ni · n− i 4 ≤ n− i^3

for ni larger than some constant N 0. Now, if we add up this probability of failure over all i

i

n^3 i

we don’t quite get something O(1/nc) because the ni are decreasing. However, we can consider ni’s only until a value nk ≈

n, because after that the worst case runnig time of the algorithm is O(n^2 k) (if the splitter is the largest or smallest). But then

∑^ k

i=

n^3 i

= O

n

The exponent here could be made larger by choosing C larger. Note that sorting the small sample does really not affect the linear time of the overall algorithm (expected and with high probability).