Analysis of Algorithms: Finding Maximum and Second-Largest Elements, Lecture notes of Algorithms and Programming

Lecture notes on algorithms, specifically lectures 1-10, given by Avrim Blum and Manuel Blum at Carnegie Mellon University. The notes cover topics such as asymptotic analysis, recurrences, probabilistic analysis, and randomized quicksort. The document also includes an introduction to algorithms, the study of algorithms, and the importance of specifications and guarantees. The notes provide examples of algorithms such as Karatsuba multiplication and Strassen's matrix multiplication algorithm.

Typology: Lecture notes

2020/2021

Uploaded on 05/11/2023

gaurishaknar
gaurishaknar 🇺🇸

3.4

(8)

232 documents

1 / 61

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
15-451 Algorithms
Lectures 1-10
Author: Avrim Blum
Instructors: Avrim Blum
Manuel Blum
Department of Computer Science
Carnegie Mellon University
August 23, 2011
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d

Partial preview of the text

Download Analysis of Algorithms: Finding Maximum and Second-Largest Elements and more Lecture notes Algorithms and Programming in PDF only on Docsity!

15-451 Algorithms

Lectures 1-

Author: Avrim Blum

Instructors: Avrim Blum

Manuel Blum

Department of Computer Science

Carnegie Mellon University

August 23, 2011

Contents

  • 1 Introduction to Algorithms
    • 1.1 Overview
    • 1.2 Introduction
    • 1.3 On guarantees and specifications
    • 1.4 An example: Karatsuba Multiplication
    • 1.5 Matrix multiplication
  • 2 Asymptotic Analysis and Recurrences
    • 2.1 Overview
    • 2.2 Asymptotic analysis
    • 2.3 Recurrences
      • 2.3.1 Solving by unrolling
      • 2.3.2 Solving by guess and inductive proof
      • 2.3.3 Recursion trees, stacking bricks, and a Master Formula
  • 3 Probabilistic Analysis and Randomized Quicksort
    • 3.1 Overview
    • 3.2 The notion of randomized algorithms
    • 3.3 The Basics of Probabilistic Analysis
      • 3.3.1 Linearity of Expectation
      • 3.3.2 Example 1: Card shuffling
      • 3.3.3 Example 2: Inversions in a random permutation
    • 3.4 Analysis of Randomized Quicksort
      • 3.4.1 Method
      • 3.4.2 Method
    • 3.5 Further Discussion
      • 3.5.1 More linearity of expectation: a random walk stock market
      • 3.5.2 Yet another way to analyze quicksort: run it backwards CONTENTS ii
  • 4 Selection (deterministic & randomized): finding the median in linear time
    • 4.1 Overview
    • 4.2 The problem and a randomized solution
    • 4.3 A deterministic linear-time algorithm
  • 5 Comparison-based Lower Bounds for Sorting
    • 5.1 Overview
    • 5.2 Sorting lower bounds
    • 5.3 Average-case lower bounds
    • 5.4 Lower bounds for randomized algorithms
  • 6 Concrete models and tight upper/lower bounds
    • 6.1 Overview
    • 6.2 Terminology and setup
    • 6.3 Sorting in the exchange model
    • 6.4 The comparison model
      • 6.4.1 Almost-tight upper-bounds for comparison-based sorting
      • 6.4.2 Finding the maximum of n elements
      • 6.4.3 Finding the second-largest of n elements
    • 6.5 Query models, and the evasiveness of connectivity
  • 7 Amortized Analysis
    • 7.1 Overview
    • 7.2 Introduction
    • 7.3 Example #1: implementing a stack as an array
    • 7.4 Piggy banks and potential functions
    • 7.5 Example #2: a binary counter
    • 7.6 Example #3: What if it costs us 2k to flip the kth bit?
    • 7.7 Example #4: A simple amortized dictionary data structure
  • 8 Balanced search trees
    • 8.1 Overview
    • 8.2 Introduction
    • 8.3 Simple binary search trees
    • 8.4 B-trees and 2-3-4 trees
    • 8.5 Treaps CONTENTS iii
  • 9 Digit-based sorting and data structures
    • 9.1 Overview
    • 9.2 Introduction
    • 9.3 Radix Sort
      • 9.3.1 Most-significant-first (MSF) radix sort
      • 9.3.2 Least-significant-first (LSF) radix sort
    • 9.4 Tries
  • 10 Universal and Perfect Hashing
    • 10.1 Overview
    • 10.2 Introduction
    • 10.3 Hashing basics
    • 10.4 Universal Hashing
      • 10.4.1 Constructing a universal hash family: the matrix method
    • 10.5 Perfect Hashing
      • 10.5.1 Method 1: an O(N 2 )-space solution
      • 10.5.2 Method 2: an O(N )-space solution
    • 10.6 Further discussion
      • 10.6.1 Another method for universal hashing
      • 10.6.2 Other uses of hashing

Lecture 1

Introduction to Algorithms

1.1 Overview

The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that we do, and why we emphasize proving guarantees. We also go through an example of a problem that is easy to relate to (multiplying two numbers) in which the straightforward approach is surprisingly not the fastest one. This example leads naturally into the study of recurrences, which is the topic of the next lecture, and provides a forward pointer to topics such as the FFT later on in the course.

Material in this lecture:

  • Administrivia (see handouts)
  • What is the study of Algorithms all about?
  • Why do we care about specifications and proving guarantees?
  • The Karatsuba multiplication algorithm.
  • Strassen’s matrix multiplication algorithm.

1.2 Introduction

This course is about the design and analysis of algorithms — how to design correct, efficient algorithms, and how to think clearly about analyzing correctness and running time.

What is an algorithm? At its most basic, an algorithm is a method for solving a computational problem. Along with an algorithm comes a specification that says what the algorithm’s guarantees are. For example, we might be able to say that our algorithm indeed correctly solves the problem in question and runs in time at most f (n) on any input of size n. This course is about the whole package: the design of efficient algorithms, and proving that they meet desired specifications. For each of these parts, we will examine important techniques that have been developed, and with practice we will build up our ability to think clearly about the key issues that arise.

2

1.4. AN EXAMPLE: KARATSUBA MULTIPLICATION 4

It is often helpful when thinking about algorithms to imagine a game where one player is the algorithm designer, trying to come up with a good algorithm for the problem, and its opponent (the “adversary”) is trying to come up with an input that will cause the algorithm to run slowly. An algorithm with good worst-case guarantees is one that performs well no matter what input the adversary chooses. We will return to this view in a more formal way when we discuss randomized algorithms and lower bounds.

1.4 An example: Karatsuba Multiplication

One thing that makes algorithm design “Computer Science” is that solving a problem in the most obvious way from its definitions is often not the best way to get a solution. A simple example of this is multiplication.

Say we want to multiply two n-bit numbers: for example, 41 × 42 (or, in binary, 101001 × 101010). According to the definition of what it means to multiply, what we are looking for is the result of adding 41 to itself 42 times (or vice versa). You could imagine actually computing the answer that way (i.e., performing 41 additions), which would be correct but not particularly efficient. If we used this approach to multiply two n-bit numbers, we would be making Θ(2n) additions. This is exponential in n even without counting the number of steps needed to perform each addition. And, in general, exponential is bad.^1 A better way to multiply is to do what we learned in grade school:

x 101010 = 42

1010010 101001

  • 101001

11010111010 = 1722

More formally, we scan the second number right to left, and every time we see a 1, we add a copy of the first number, shifted by the appropriate number of bits, to our total. Each addition takes O(n) time, and we perform at most n additions, which means the total running time here is O(n^2 ). So, this is a simple example where even though the problem is defined “algorithmically”, using the definition is not the best way of solving the problem.

Is the above method the fastest way to multiply two numbers? It turns out it is not. Here is a faster method called Karatsuba Multiplication, discovered by Anatoli Karatsuba, in Russia, in 1962. In this approach, we take the two numbers X and Y and split them each into their most-significant half and their least-significant half:

X = 2n/^2 A + B A B Y = 2n/^2 C + D C D

(^1) This is reminiscent of an exponential-time sorting algorithm I once saw in Prolog. The code just contains the definition of what it means to sort the input — namely, to produce a permutation of the input in which all elements are in ascending order. When handed directly to the interpreter, it results in an algorithm that examines all n! permutations of the given input list until it finds one that is in the right order.

1.5. MATRIX MULTIPLICATION 5

We can now write the product of X and Y as

XY = 2 nAC + 2n/^2 BC + 2n/^2 AD + BD. (1.1)

This does not yet seem so useful: if we use (1.1) as a recursive multiplication algorithm, we need to perform four n/2-bit multiplications, three shifts, and three O(n)-bit additions. If we use T (n) to denote the running time to multiply two n-bit numbers by this method, this gives us a recurrence of

T (n) = 4 T (n/2) + cn, (1.2)

for some constant c. (The cn term reflects the time to perform the additions and shifts.) This recurrence solves to O(n^2 ), so we do not seem to have made any progress. (In the next lecture we will go into the details of how to solve recurrences like this.)

However, we can take the formula in (1.1) and rewrite it as follows:

(2n^ − 2 n/^2 )AC + 2n/^2 (A + B)(C + D) + (1 − 2 n/^2 )BD. (1.3)

It is not hard to see — you just need to multiply it out — that the formula in (1.3) is equivalent to the expression in (1.1). The new formula looks more complicated, but, it results in only three multiplications of size n/2, plus a constant number of shifts and additions. So, the resulting recurrence is

T (n) = 3 T (n/2) + c′n, (1.4)

for some constant c′. This recurrence solves to O(nlog^2 3 ) ≈ O(n^1.^585 ).

Is this method the fastest possible? Again it turns out that one can do better. In fact, Karp discov- ered a way to use the Fast Fourier Transform to multiply two n-bit numbers in time O(n log^2 n). Sch¨onhage and Strassen in 1971 improved this to O(n log n log log n), which was until very recently the asymptotically fastest algorithm known.^2 We will discuss the FFT later on in this course.

Actually, the kind of analysis we have been doing really is meaningful only for very large numbers. On a computer, if you are multiplying numbers that fit into the word size, you would do this in hardware that has gates working in parallel. So instead of looking at sequential running time, in this case we would want to examine the size and depth of the circuit used, for instance. This points out that, in fact, there are different kinds of specifications that can be important in different settings.

1.5 Matrix multiplication

It turns out the same basic divide-and-conquer approach of Karatsuba’s algorithm can be used to speed up matrix multiplication as well. To be clear, we will now be considering a computational model where individual elements in the matrices are viewed as “small” and can be added or multi- plied in constant time. In particular, to multiply two n-by-n matrices in the usual way (we take the

(^2) F¨urer in 2007 improved this by replacing the log log n term with 2O(log∗^ n), where log∗ (^) n is a very slowly growing function discussed in Lecture 14. It remains unknown whether eliminating it completely and achieving running time O(n log n) is possible.

Lecture 2

Asymptotic Analysis and Recurrences

2.1 Overview

In this lecture we discuss the notion of asymptotic analysis and introduce O, Ω, Θ, and o notation. We then turn to the topic of recurrences, discussing several methods for solving them. Recurrences will come up in many of the algorithms we study, so it is useful to get a good intuition for them right at the start. In particular, we focus on divide-and-conquer style recurrences, which are the most common ones we will see.

Material in this lecture:

  • Asymptotic notation: O, Ω, Θ, and o.
  • Recurrences and how to solve them.
    • Solving by unrolling.
    • Solving with a guess and inductive proof.
    • Solving using a recursion tree.
    • A master formula.

2.2 Asymptotic analysis

When we consider an algorithm for some problem, in addition to knowing that it produces a correct solution, we will be especially interested in analyzing its running time. There are several aspects of running time that one could focus on. Our focus will be primarily on the question: “how does the running time scale with the size of the input?” This is called asymptotic analysis, and the idea is that we will ignore low-order terms and constant factors, focusing instead on the shape of the running time curve. We will typically use n to denote the size of the input, and T (n) to denote the running time of our algorithm on an input of size n.

We begin by presenting some convenient definitions for performing this kind of analysis.

Definition 2.1 T (n) ∈ O(f (n)) if there exist constants c, n 0 > 0 such that T (n) ≤ cf (n) for all n > n 0.

7

2.2. ASYMPTOTIC ANALYSIS 8

Informally we can view this as “T (n) is proportional to f (n), or better, as n gets large.” For example, 3n^2 + 17 ∈ O(n^2 ) and 3n^2 + 17 ∈ O(n^3 ). This notation is especially useful in discussing upper bounds on algorithms: for instance, we saw last time that Karatsuba multiplication took time O(nlog^2 3 ).

Notice that O(f (n)) is a set of functions. Nonetheless, it is common practice to write T (n) = O(f (n)) to mean that T (n) ∈ O(f (n)): especially in conversation, it is more natural to say “T (n) is O(f (n))” than to say “T (n) is in O(f (n))”. We will typically use this common practice, reverting to the correct set notation when this practice would cause confusion.

Definition 2.2 T (n) ∈ Ω(f (n)) if there exist constants c, n 0 > 0 such that T (n) ≥ cf (n) for all n > n 0.

Informally we can view this as “T (n) is proportional to f (n), or worse, as n gets large.” For example, 3n^2 − 2 n ∈ Ω(n^2 ). This notation is especially useful for lower bounds. In Chapter 5, for instance, we will prove that any comparison-based sorting algorithm must take time Ω(n log n) in the worst case (or even on average).

Definition 2.3 T (n) ∈ Θ(f (n)) if T (n) ∈ O(f (n)) and T (n) ∈ Ω(f (n)).

Informally we can view this as “T (n) is proportional to f (n) as n gets large.”

Definition 2.4 T (n) ∈ o(f (n)) if for all constants c > 0 , there exists n 0 > 0 such that T (n) < cf (n) for all n > n 0.

For example, last time we saw that we could indeed multiply two n-bit numbers in time o(n^2 ) by the Karatsuba algorithm. Very informally, O is like ≤, Ω is like ≥, Θ is like =, and o is like <. There is also a similar notation ω that corresponds to >.

In terms of computing whether or not T (n) belongs to one of these sets with respect to f (n), a convenient way is to compute the limit:

nlim→∞

T (n) f (n)

If the limit exists, then we can make the following statements:

  • If the limit is 0, then T (n) = o(f (n)) and T (n) = O(f (n)).
  • If the limit is a number greater than 0 (e.g., 17) then T (n) = Θ(f (n)) (and T (n) = O(f (n)) and T (n) = Ω(f (n)))
  • If the limit is infinity, then T (n) = ω(f (n)) and T (n) = Ω(f (n)).

For example, suppose T (n) = 2n^3 + 100n^2 log 2 n + 17 and f (n) = n^3. The ratio of these is 2 + (100 log 2 n)/n + 17/n^3. In this limit, this goes to 2. Therefore, T (n) = Θ(f (n)). Of course, it is possible that the limit doesn’t exist — for instance if T (n) = n(2 + sin n) and f (n) = n then the ratio oscillates between 1 and 3. In this case we would go back to the definitions to say that T (n) = Θ(n).

2.3. RECURRENCES 10

(n/2)(cn/2) = cn^2 /4. So, it is Θ(n^2 ). Similarly, a recurrence T (n) = n^5 + T (n − 1) unrolls to:

T (n) = n^5 + (n − 1)^5 + (n − 2)^5 +... + 1^5 , (2.3)

which solves to Θ(n^6 ) using the same style of reasoning as before. In particular, there are n terms each of which is at most n^5 so the sum is at most n^6 , and the top n/2 terms are each at least (n/2)^5 so the sum is at least (n/2)^6. Another convenient way to look at many summations of this form is to see them as approximations to an integral. E.g., in this last case, the sum is at least the integral of f (x) = x^5 evaluated from 0 to n, and at most the integral of f (x) = x^5 evaluated from 1 to n + 1. So, the sum lies in the range [ 16 n^6 , 16 (n + 1)^6 ].

2.3.2 Solving by guess and inductive proof

Another good way to solve recurrences is to make a guess and then prove the guess correct induc- tively. Or if we get into trouble proving our guess correct (e.g., because it was wrong), often this will give us clues as to a better guess. For example, say we have the recurrence

T (n) = 7 T (n/7) + n, (2.4) T (1) = 0. (2.5)

We might first try a solution of T (n) ≤ cn for some c > 0. We would then assume it holds true inductively for n′^ < n (the base case is obviously true) and plug in to our recurrence (using n′^ = n/7) to get:

T (n) ≤ 7(cn/7) + n = cn + n = (c + 1)n.

Unfortunately, this isn’t what we wanted: our multiplier “c” went up by 1 when n went up by a factor of 7. In other words, our multiplier is acting like log 7 (n). So, let’s make a new guess using a multiplier of this form. So, we have a new guess of

T (n) ≤ n log 7 (n). (2.6)

If we assume this holds true inductively for n′^ < n, then we get:

T (n) ≤ 7[(n/7) log 7 (n/7)] + n = n log 7 (n/7) + n = n log 7 (n) − n + n = n log 7 (n). (2.7)

So, we have verified our guess.

It is important in this type of proof to be careful. For instance, one could be lulled into thinking that our initial guess of cn was correct by reasoning “we assumed T (n/7) was Θ(n/7) and got T (n) = Θ(n)”. The problem is that the constants changed (c turned into c + 1) so they really weren’t constant after all!

2.3. RECURRENCES 11

2.3.3 Recursion trees, stacking bricks, and a Master Formula

The final method we examine, which is especially good for divide-and-conquer style recurrences, is the use of a recursion tree. We will use this to method to produce a simple “master formula” that can be applied to many recurrences of this form. Consider the following type of recurrence:

T (n) = aT (n/b) + cnk^ (2.8) T (1) = c,

for positive constants a, b, c, and k. This recurrence corresponds to the time spent by an algorithm that does cnk^ work up front, and then divides the problem into a pieces of size n/b, solving each one recursively. For instance, mergesort, Karatsuba multiplication, and Strassen’s algorithm all fit this mold. A recursion tree is just a tree that represents this process, where each node contains inside it the work done up front and then has one child for each recursive call. The leaves of the tree are the base cases of the recursion. A tree for the recurrence (2.8) is given below.^1

cnk ^  

aa aaa

A A

PPP @@^ ^ @@ PPP

^ @@ @@ @@

6

?

c(n/b)k^ c(n/b)k^ c(n/b)k

c(n/b^2 )k^ c(n/b^2 )k^ c(n/b^2 )k

logb(n) c(n/b^2 )k · · · ^ @@

To compute the result of the recurrence, we simply need to add up all the values in the tree. We can do this by adding them up level by level. The top level has value cnk, the next level sums to ca(n/b)k^ , the next level sums to ca^2 (n/b^2 )k, and so on. The depth of the tree (the number of levels not including the root) is logb(n). Therefore, we get a summation of:

cnk^

[ 1 + a/bk^ + (a/bk)^2 + (a/bk^ )^3 + ... + (a/bk^ )logb^ n

] (2.9)

To help us understand this, let’s define r = a/bk. Notice that r is a constant, since a, b, and k are constants. For instance, for Strassen’s algorithm r = 7/ 22 , and for mergesort r = 2/2 = 1. Using our definition of r, our summation simplifies to:

cnk^

[ 1 + r + r^2 + r^3 + ... + rlogb^ n

] (2.10)

We can now evaluate three cases:

Case 1: r < 1. In this case, the sum is a convergent series. Even if we imagine the series going to infinity, we still get that the sum 1 + r + r^2 +... = 1/(1 − r). So, we can upper-bound formula (2.9) by cnk/(1 − r), and lower bound it by just the first term cnk. Since r and c are constants, this solves to Θ(nk).

(^1) This tree has branching factor a.

Lecture 3

Probabilistic Analysis and

Randomized Quicksort

3.1 Overview

In this lecture we begin by introducing randomized (probabilistic) algorithms and the notion of worst-case expected time bounds. We make this concrete with a discussion of a randomized version of the Quicksort sorting algorithm, which we prove has worst-case expected running time O(n log n). In the process, we discuss basic probabilistic concepts such as events, random variables, and linearity of expectation.

3.2 The notion of randomized algorithms

As we have discussed previously, we are interested in how the running time of an algorithm scales with the size of the input. In addition, we will usually be interested in worst-case running time, meaning the worst-case over all inputs of a given size. That is, if I is some input and T (I) is running time of our algorithm on input I, then T (n) = max{T (I)}inputs I of size n. One can also look at notions of average-case running time, where we are concerned with our performance on “typical” inputs I. However, one difficulty with average-case bounds is that it is often unclear in advance what typical inputs for some problem will really look like, and furthermore this gets more difficult if our algorithm is being used as a subroutine inside some larger computation. In particular, if we have a bound on the worst-case running time of an algorithm for some problem A, it means that we can now consider solving other problems B by somehow converting instances of B to instances of problem A. We will see many examples of this later when we talk about network flow and linear programming as well as in our discussions of NP-completeness.

On the other hand, there are algorithms that have a large gap between their performance “on average” and their performance in the worst case. Sometimes, in this case we can improve the worst-case performance by actually adding randomization into the algorithm itself. One classic example of this is the Quicksort sorting algorithm.

Quicksort: Given array of some length n,

  1. Pick an element p of the array as the pivot (or halt if the array has size 0 or 1).

13

3.3. THE BASICS OF PROBABILISTIC ANALYSIS 14

  1. Split the array into sub-arrays LESS, EQUAL, and GREATER by comparing each element to the pivot. (LESS has all elements less than p, EQUAL has all elements equal to p, and GREATER has all elements greater than p).
  2. recursively sort LESS and GREATER.

The Quicksort algorithm given above is not yet fully specified because we have not stated how we will pick the pivot element p. For the first version of the algorithm, let’s always choose the leftmost element.

Basic-Quicksort: Run the Quicksort algorithm as given above, always choosing the leftmost element in the array as the pivot.

What is worst-case running time of Basic-Quicksort? We can see that if the array is already sorted, then in Step 2, all the elements (except p) will go into the GREATER bucket. Furthermore, since the GREATER array is in sorted order,^1 this process will continue recursively, resulting in time Ω(n^2 ). We can also see that the running time is O(n^2 ) on any array of n elements because Step 1 can be executed at most n times, and Step 2 takes at most n steps to perform. Thus, the worst-case running time is Θ(n^2 ).

On the other hand, it turns out (and we will prove) that the average-case running time for Basic- Quicksort (averaging over all different initial orderings of the n elements in the array) is O(n log n). This fact may be small consolation if the inputs we are faced with are the bad ones (e.g., if our lists are nearly sorted already). One way we can try to get around this problem is to add randomization into the algorithm itself:

Randomized-Quicksort: Run the Quicksort algorithm as given above, each time picking a ran- dom element in the array as the pivot.

We will prove that for any given array input array I of n elements, the expected time of this algorithm E[T (I)] is O(n log n). This is called a Worst-case Expected-Time bound. Notice that this is better than an average-case bound because we are no longer assuming any special properties of the input. E.g., it could be that in our desired application, the input arrays tend to be mostly sorted or in some special order, and this does not affect our bound because it is a worst-case bound with respect to the input. It is a little peculiar: making the algorithm probabilistic gives us more control over the running time.

To prove these bounds, we first detour into the basics of probabilistic analysis.

3.3 The Basics of Probabilistic Analysis

Consider rolling two dice and observing the results. We call this an experiment, and it has 36 possible outcomes: it could be that the first die comes up 1 and the second comes up 2, or that the first comes up 2 and the second comes up 1, and so on. Each of these outcomes has probability 1 /36 (assuming these are fair dice). Suppose we care about some quantity such as “what is the

(^1) Technically, this depends on how the partitioning step is implemented, but will be the case for any reasonable implementation.

3.4. ANALYSIS OF RANDOMIZED QUICKSORT 16

random variables and then separately analyzing these simple RVs. Let’s first prove this fact and then see how it can be used.

Theorem 3.1 (Linearity of Expectation) For any two random variables X and Y , E[X +Y ] = E[X] + E[Y ].

Proof (for discrete RVs): This follows directly from the definition as given in (3.1).

E[X + Y ] =

e∈S

Pr(e)(X(e) + Y (e)) =

e∈S

Pr(e)X(e) +

e∈S

Pr(e)Y (e) = E[X] + E[Y ].

3.3.2 Example 1: Card shuffling

Suppose we unwrap a fresh deck of cards and shuffle it until the cards are completely random. How many cards do we expect to be in the same position as they were at the start? To solve this, let’s think formally about what we are asking. We are looking for the expected value of a random variable X denoting the number of cards that end in the same position as they started. We can write X as a sum of random variables Xi, one for each card, where Xi = 1 if the ith card ends in position i and Xi = 0 otherwise. These Xi are easy to analyze: Pr(Xi = 1) = 1/n where n is the number of cards. Pr(xi = 1) is also E[Xi]. Now we use linearity of expectation:

E[X] = E[X 1 +... + Xn] = E[X 1 ] +... + E[Xn] = 1.

So, this is interesting: no matter how large a deck we are considering, the expected number of cards that end in the same position as they started is 1.

3.3.3 Example 2: Inversions in a random permutation

[hmm, lets leave this for homework]

3.4 Analysis of Randomized Quicksort

We now give two methods for analyzing randomized quicksort. The first is more intuitive but the details are messier. The second is a neat tricky way using the power of linearity of expectation: this will be a bit less intuitive but the details come out nicer.

3.4.1 Method 1

For simplicity, let us assume no two elements in the array are equal — when we are done with the analysis, it will be easy to look back and see that allowing equal keys could only improve performance. We now prove the following theorem.

Theorem 3.2 The expected number of comparisons made by randomized quicksort on an array of size n is at most 2 n ln n.

3.4. ANALYSIS OF RANDOMIZED QUICKSORT 17

Proof: First of all, when we pick the pivot, we perform n − 1 comparisons (comparing all other elements to it) in order to split the array. Now, depending on the pivot, we might split the array into a LESS of size 0 and a GREATER of size n − 1, or into a LESS of size 1 and a GREATER of size n − 2, and so on, up to a LESS of size n − 1 and a GREATER of size 0. All of these are equally likely with probability 1/n each. Therefore, we can write a recurrence for the expected number of comparisons T (n) as follows:

T (n) = (n − 1) +

n

n∑− 1

i=

(T (i) + T (n − i − 1)). (3.4)

Formally, we are using the expression for Expectation given in (3.3), where the n different possible splits are the events Ai.^3 We can rewrite equation (3.4) by regrouping and getting rid of T (0):

T (n) = (n − 1) +

n

n∑− 1

i=

T (i) (3.5)

Now, we can solve this by the “guess and prove inductively” method. In order to do this, we first need a good guess. Intuitively, most pivots should split their array “roughly” in the middle, which suggests a guess of the form cn ln n for some constant c. Once we’ve made our guess, we will need to evaluate the resulting summation. One of the easiest ways of doing this is to upper-bound the sum by an integral. In particular if f (x) is an increasing function, then

n∑− 1

i=

f (i) ≤

∫ (^) n

1

f (x)dx,

which we can see by drawing a graph of f and recalling that an integral represents the “area under the curve”. In our case, we will be using the fact that

∫ (cx ln x)dx = (c/2)x^2 ln x − cx^2 /4.

So, let’s now do the analysis. We are guessing that T (i) ≤ ci ln i for i ≤ n − 1. This guess works for the base case T (1) = 0 (if there is only one element, then there are no comparisons). Arguing by induction we have:

T (n) ≤ (n − 1) +

n

n∑− 1

i=

(ci ln i)

≤ (n − 1) +

n

∫ (^) n

1

(cx ln x)dx

≤ (n − 1) +

n

( (c/2)n^2 ln n − cn^2 /4 + c/ 4

)

≤ cn ln n, for c = 2.

In terms of the number of comparisons it makes, Randomized Quicksort is equivalent to randomly shuffling the input and then handing it off to Basic Quicksort. So, we have also proven that Basic Quicksort has O(n log n) average-case running time.

(^3) In addition, we are using Linearity of Expectation to say that the expected time given one of these events can be written as the sum of two expectations.