


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Average case Analysis of Quicksort, Assumption about the distribution of inputs, Random choices of pivots, Quick sort Procedure, Recurrence, Induction Hypothesis are the key points in this study notes.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



We will now show that in the average case, quicksort runs in Θ(n log n) time. Recall that when we talked about average case at the beginning of the semester, we said that it depends on some assumption about the distribution of inputs. However, in the case of quicksort, the analysis does not depend on the distribution of input at all. It only depends upon the random choices of pivots that the algorithm makes. This is good, because it means that the analysis of the algorithm’s performance is the same for all inputs. In this case the average is computed over all possible random choices that the algorithm might make for the choice of the pivot index in the second step of the QuickSort procedure above.
To analyze the average running time, we let T(n) denote the average running time of QuickSort on a list of size n. It will simplify the analysis to assume that all of the elements are distinct. The algorithm has n random choices for the pivot element, and each choice has an equal probability of 1/n of occurring. So we can modify the above recurrence to compute an average rather than a max, giving:
The time T(n) is the weighted sum of the times taken for various choices of q. I.e.,
We have not seen such a recurrence before. To solve it, expansion is possible but it is rather tricky. We will attempt a constructive induction to solve it. We know that we want
For the base case n = 2 we have
We want this to be at most c2 log 2 , i.e.,
T(2) ≤ c2 log 2 or 4 ≤ c2 log 2 therefore c ≥ 4/(2 log 2) ≈ 2.88.
For the induction step, we assume that n ≥ 3 and The induction hypothesis is that for any
expanding T(n) and moving the factor of n outside the sum we have
Observe that the two sums add up the same values. One counts up and other counts down.
Thus we can replace them with 2 ∑
−
=
1
0
n
q
T q. We will extract T(0) and T(1) and treat
them specially. These two do not follow the formula.
We will apply the induction hypothesis for q < n we have
Plug this back into the expression for T(n) to get
T(n) = 2c/n (n^2 / 2 ln n - n^2 / 4 ) + n + 4/n
= cn ln n – cn/2 + n + 4/n = cn ln n + n(1 – c/2) + 4/n T(n) = cn ln n + n(1 – c/2 ) + 4/n
To finish the proof, we want all of this to be at most cn ln n. For this to happen, we will need to select c such that
If we select c = 3, and use the fact that n ≥ 3 we get n(1 – c/2) + 4/n = 3/n – n/
From the basis case we had c ≥ 2.88. Choosing c = 3 satisfies all the constraints. Thus T(n) = 3nln n ∈ Θ(n log n).