Selection Sort, Lecture Notes - Computer Science | Study notes Data Structures and Algorithms

11/17/10

20:54:47 1

CS61B: Lecture 34

Monday, November 15, 2010

Today’s reading: Goodrich & Tamassia, Sections 11.3.1 & 11.5.

SELECTION

=========

Suppose that we want to find the kth smallest key in a list. In other words,

we want to know which item has index j if the list is sorted (where j = k - 1).

We could simply sort the list, then look up the item at index j. But if we

don’t actually need to sort the list, is there a faster way? This problem is

called _selection_.

One example is finding the median of a set of keys. If n keys are numbered

from 0 to n - 1 (where n is odd), we are looking for the item whose index is

j = (n - 1) / 2 in the sorted list.

Quickselect

-----------

We can modify quicksort to perform selection for us. Observe that when we

choose a pivot v and use it to partition the list into three lists I1, Iv, and

I2, we know which of the three lists contains index j, because we know the

lengths of I1 and I2. Therefore, we only need to search one of the three

lists.

Here’s the quickselect algorithm for finding the item at index j - that is,

having the (j + 1)th smallest key.

Start with an unsorted list I of n input items.

Choose a pivot item v from I.

Partition I into three unsorted lists I1, Iv, and I2.

- I1 contains all items whose keys are smaller than v’s key.

- I2 contains all items whose keys are larger than v’s.

- Iv contains the pivot v.

- Items with the same key as v can go into any of the three lists.

(In list-based quickselect, they go into Iv; in array-based quickselect,

they go into I1 and I2, just like in array-based quicksort.)

if (j < |I1|) {

Recursively find the item with index j in I1; return it.

} else if (j < |I1| + |Iv|) {

Return the pivot v.

} else { // j >= |I1| + |Iv|.

Recursively find the item with index j - |I1| - |Iv| in I2; return it.

}

The advantage of quickselect over quicksort is that we only have to make one

recursive call, instead of two. Since we make at most _one_ recursive call at

_every_ level of the recursion tree, quickselect is much faster than quicksort.

I won’t analyze quickselect here, but it runs in Theta(n) average time if we

select pivots randomly.

We can easily modify the code for quicksort on arrays, presented in Lecture 32,

to do selection. The partitioning step is done exactly according to the

Lecture 32 pseudocode for array quicksort. Recall that when the partition

stage finishes, the pivot is stored at index "i" (see the variable "i" in the

array quicksort pseudocode). In the quickselect pseudocode above, just replace

|I1| with i and |Iv| with 1.

A LOWER BOUND ON COMPARISON-BASED SORTING

=========================================

Suppose we have a scrambled array of n numbers, with each number from 1...n

occurring once. How many possible orders can the numbers be in?

The answer is n!, where n! = 1 * 2 * 3 * ... * (n-2) * (n-1) * n. Here’s why:

the first number in the array can be anything from 1...n, yielding n

possibilities. Once the first number is chosen, the second number can be any

one of the remaining n-1 numbers, so there are n * (n-1) possible choices of

the first two numbers. The third number can be any one of the remaining n-2

numbers, yielding n * (n-1) * (n-2) possibilities for the first three numbers.

Continue this reasoning to its logical conclusion.

Each different order is called a _permutation_ of the numbers, and there are n!

possible permutations. (For Homework 9, you are asked to create a random

permutation of maze walls.)

Observe that if n > 0,

n! = 1 * 2 * ... * (n-1) * n <= n * n * n * ... * n * n * n = n

and (supposing n is even)

n n n/2

n! = 1 * 2 * ... * (n-1) * n >= - * (- + 1) * ... * (n-1) * n >= (n/2)

2 2

so n! is between (n/2)^(n/2) and n^n. Let’s look at the logarithms of both

these numbers: log((n/2)^(n/2)) = (n/2) log (n/2), which is in Theta(n log n),

and log(n^n) = n log n. Hence, log(n!) is also in Theta(n log n).

A _comparison-based_sort_ is one in which all decisions are based on comparing

keys (generally using "if" statements). All actions taken by the sorting

algorithm are based on the results of a sequence of true/false questions. All

of the sorting algorithms we have studied are comparison-based.

Suppose that two computers run the _same_ sorting algorithm at the same time on

two _different_ inputs. Suppose that every time one computer executes an "if"

statement and finds it true, the other computer executes the same "if"

statement and also finds it true; likewise, when one computer executes an "if"

and finds it false, so does the other. Then both runs perform exactly the same

data movements (e.g. swapping the numbers at indices i and j) in exactly the

same order, so they both permute their inputs in _exactly_ the same way.

A correct sorting algorithm must generate a _different_ sequence of true/false

answers for each different permutation of 1...n, because it takes a different

sequence of data movements to sort each permutation. There are n! different

permutations, thus n! different sequences of true/false answers.

If a sorting algorithm asks d true/false questions, it generates <= 2^d

different sequences of true/false answers. If it correctly sorts every

permutation of 1...n, then n! <= 2^d, so log_2 (n!) <= d, and d is in

Omega(n log n). The algorithm spends Omega(d) time asking these d questions.

Hence,

==============================================================================

EVERY comparison-based sorting algorithm takes Omega(n log n) worst-case time.

==============================================================================

However, there are faster sorting algorithms that can make q-way decisions for

large values of q, instead of true/false (2-way) decisions. Some of these

algorithms run in linear time.

Selection Sort, Lecture Notes - Computer Science, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Selection Sort, Lecture Notes - Computer Science and more Study notes Data Structures and Algorithms in PDF only on Docsity!

11/17/1020:54:^

A LOWER BOUND ON COMPARISON-BASED

SORTING

=^1 ^2 ^3

(-^ +^ 1)^ *^ ...

11/17/1020:54:^

LINEAR-TIME^

SORTING

|^.^ |^

.^ |^ *^

|^.^ |^

*^ |^.^

|^.^ |^

.^ |

-------^ -------

|^ |^

.^ |^

|^ ^ | |^

| |^.^ |

---|---^ -------

^^ ^^

|^ *^ |^

|^.^ |^

|^ *^ |

-------^

---|---^

^^

^

-------^

-------^ ------- ------- ------- ------- ------- ------- -------

-------^ ------- ------- ------- ------- ------- ------- -------

Selection Sort, Lecture Notes - Computer Science, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Selection Sort, Lecture Notes - Computer Science and more Study notes Data Structures and Algorithms in PDF only on Docsity!

11/17/1020:54:^

A LOWER BOUND ON COMPARISON-BASED

SORTING

=^1 *^2 *^3

(-^ +^ 1)^ *^ ...

11/17/1020:54:^

LINEAR-TIME^

SORTING

|^.^ |^

.^ |^ *^

|^.^ |^

*^ |^.^

|^.^ |^

.^ |

-------^ -------

|^ |^

.^ |^

|^ *^ | |^ *

| |^.^ |

---|---^ -------

^^ ^^

|^ *^ |^

|^.^ |^

|^ *^ |

-------^

---|---^

^^

^

-------^

-------^ ------- ------- ------- ------- ------- ------- -------

-------^ ------- ------- ------- ------- ------- ------- -------

=^1 ^2 ^3

|^ ^ | |^