Average case Analysis of Quicksort - Design and Analysis - Study Notes, Study notes of Digital Systems Design

Average case Analysis of Quicksort, Assumption about the distribution of inputs, Random choices of pivots, Quick sort Procedure, Recurrence, Induction Hypothesis are the key points in this study notes.

Typology: Study notes

2011/2012

Uploaded on 11/03/2012

ankitay
ankitay 🇮🇳

4.4

(50)

106 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture No.14
4.3.5 Average-case Analysis of Quicksort
We will now show that in the average case, quicksort runs in Θ(n log n) time. Recall that
when we talked about average case at the beginning of the semester, we said that it
depends on some assumption about the distribution of inputs. However, in the case of
quicksort, the analysis does not depend on the distribution of input at all. It only depends
upon the random choices of pivots that the algorithm makes. This is good, because it
means that the analysis of the algorithm’s performance is the same for all inputs.
In this case the average is computed over all possible random choices that the algorithm
might make for the choice of the pivot index in the second step of the QuickSort
procedure above.
To analyze the average running time, we let T(n) denote the average running time of
QuickSort on a list of size n. It will simplify the analysis to assume that all of the
elements are distinct. The algorithm has n random choices for the pivot element, and each
choice has an equal probability of 1/n of occurring. So we can modify the above
recurrence to compute an average rather than a max, giving:
The time T(n) is the weighted sum of the times taken for various choices of q. I.e.,
T(n) = [ 1/n ( T(0) + T(n - 1) + n ) + 1/n ( T(1) + T(n - 2) + n )
+ 1/n ( T(2) + T(n - 3 ) + n ) + · · · + 1/n (T(n - 1) + T(0) + n)I ]
We have not seen such a recurrence before. To solve it, expansion is possible but it is
rather tricky. We will attempt a constructive induction to solve it. We know that we want
a _(n log n). Let us assume that T(n) cn log n) for n 2 where c is a constant.
For the base case n = 2 we have
We want this to be at most c2 log 2, i.e.,
Docsity.com
pf3
pf4

Partial preview of the text

Download Average case Analysis of Quicksort - Design and Analysis - Study Notes and more Study notes Digital Systems Design in PDF only on Docsity!

Lecture No.

4.3.5 Average-case Analysis of Quicksort

We will now show that in the average case, quicksort runs in Θ(n log n) time. Recall that when we talked about average case at the beginning of the semester, we said that it depends on some assumption about the distribution of inputs. However, in the case of quicksort, the analysis does not depend on the distribution of input at all. It only depends upon the random choices of pivots that the algorithm makes. This is good, because it means that the analysis of the algorithm’s performance is the same for all inputs. In this case the average is computed over all possible random choices that the algorithm might make for the choice of the pivot index in the second step of the QuickSort procedure above.

To analyze the average running time, we let T(n) denote the average running time of QuickSort on a list of size n. It will simplify the analysis to assume that all of the elements are distinct. The algorithm has n random choices for the pivot element, and each choice has an equal probability of 1/n of occurring. So we can modify the above recurrence to compute an average rather than a max, giving:

The time T(n) is the weighted sum of the times taken for various choices of q. I.e.,

T(n) = [ 1/n ( T(0) + T(n - 1) + n ) + 1/n ( T(1) + T(n - 2) + n )

+ 1/n ( T(2) + T(n - 3 ) + n ) + · · · + 1/n (T(n - 1) + T(0) + n)I ]

We have not seen such a recurrence before. To solve it, expansion is possible but it is rather tricky. We will attempt a constructive induction to solve it. We know that we want

a _(n log n). Let us assume that T(n) ≤ cn log n) for n ≥ 2 where c is a constant.

For the base case n = 2 we have

We want this to be at most c2 log 2 , i.e.,

T(2) ≤ c2 log 2 or 4 ≤ c2 log 2 therefore c ≥ 4/(2 log 2) ≈ 2.88.

For the induction step, we assume that n ≥ 3 and The induction hypothesis is that for any

n′ < n, we have T(n′) ≥ c n′ log n′. We want to prove that it is true for T(n). By

expanding T(n) and moving the factor of n outside the sum we have

Observe that the two sums add up the same values. One counts up and other counts down.

Thus we can replace them with 2 ∑

=

1

0

n

q

T q. We will extract T(0) and T(1) and treat

them specially. These two do not follow the formula.

We will apply the induction hypothesis for q < n we have

Plug this back into the expression for T(n) to get

T(n) = 2c/n (n^2 / 2 ln n - n^2 / 4 ) + n + 4/n

T(n) = 2c/n (n^2 / 2 ln n - n^2 / 4) + n + 4/n

= cn ln n – cn/2 + n + 4/n = cn ln n + n(1 – c/2) + 4/n T(n) = cn ln n + n(1 – c/2 ) + 4/n

To finish the proof, we want all of this to be at most cn ln n. For this to happen, we will need to select c such that

n(1 – c/2) + 4/n ≤ 0

If we select c = 3, and use the fact that n ≥ 3 we get n(1 – c/2) + 4/n = 3/n – n/

From the basis case we had c ≥ 2.88. Choosing c = 3 satisfies all the constraints. Thus T(n) = 3nln n ∈ Θ(n log n).