Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Comparison of Sorting Algorithms: Insertion, Merge, Heap, Quick, Radix, Bin Sort, Study notes of Data Structures and Algorithms

Virginia Polytechnic Institute and State University (Virginia Tech)Data Structures and Algorithms

An analysis of different sorting algorithms, including insertion sort, merge sort, heap sort, quicksort, radix sort, and bin sort. The average and worst/best case analysis of each algorithm, as well as the lower bound on the cost of sorting. It also discusses the improvement of insertion sort and the performance of shell sort, merge sort, heap sort, and quicksort.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-47m 🇺🇸

10 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

CS 3204 Operating Systems

©William D McQuain, January 2005 1

Sorting

Data Structures & OO Development II

1

Sorting Considerations

We consider sorting a list of records, either into ascending or descending order, based

upon the value of some field of the record we will call the sort key.

The list may be contiguous and randomly accessible (e.g., an array), or it may be

dispersed and only sequentially accessible (e.g., a linked list). The same logic applies

in both cases, although implementation details will differ.

When analyzing the performance of various sorting algorithms we will generally

consider two factors:

- the number of sort key comparisons that are required

- the number of times records in the list must be moved

Both worst-case and average-case performance is significant.

Sorting

Data Structures & OO Development II

2

Internal or External?

In an internal sort, the list of records is small enough to be maintained entirely in

physical memory for the duration of the sort.

In an external sort, the list of records will not fit entirely into physical memory at once.

In that case, the records are kept in disk files and only a selection of them are resident

in physical memory at any given time.

We will consider only internal sorting at this time.

Discover Study notes of Data Structures and Algorithms Virginia Polytechnic Institute and State University (Virginia Tech)

Partial preview of the text

Download Comparison of Sorting Algorithms: Insertion, Merge, Heap, Quick, Radix, Bin Sort and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Sorting

Data Structures & OO Development II

Sorting Considerations 1

We consider sorting a list of records, either into ascending or descending order, based upon the value of some field of the record we will call the sort key.

The list may be contiguous and randomly accessible (e.g., an array), or it may be dispersed and only sequentially accessible (e.g., a linked list). The same logic applies in both cases, although implementation details will differ.

When analyzing the performance of various sorting algorithms we will generally consider two factors:

the number of sort key comparisons that are required
the number of times records in the list must be moved

Both worst-case and average-case performance is significant.

Sorting

Internal or External? 2

In an internal sort, the list of records is small enough to be maintained entirely in physical memory for the duration of the sort.

In an external sort, the list of records will not fit entirely into physical memory at once. In that case, the records are kept in disk files and only a selection of them are resident in physical memory at any given time.

We will consider only internal sorting at this time.

Sorting

Data Structures & OO Development II

Insertion Sort 3

Insertion Sort:

sorted part unsorted part

next element to place 19

shift sorted tail

copy element

place element

Insertion sort closely resembles the insertion function for a sorted list. For a contiguous list, the primary costs are the comparisons to determine which part of the sorted portion must be shifted, and the assignments needed to accomplish that shifting of the sorted tail.

unsorted part

Sorting

Insertion Sort Average Comparisons 4

Assuming a list of N elements, Insertion Sort requires:

Average case: N^2 /4 + Θ(N) comparisons and N^2 /4 + Θ(N) assignments

Consider the element which is initially at the Kth^ position and suppose it winds up at position j, where j can be anything from 1 to K. A final position of j will require K – j + 1 comparisons. Therefore, on average, the number of comparisons to place the Kth^ element is:

2 1

= +^ −

 +^

∑ ∑

−

= =

N N

K N K N N

k

N

k

( )

1

∑ − + = −

K

K K

K

K j

K

j

The average total cost for insertion sort on a list of N elements is thus:

Sorting

Data Structures & OO Development II

Lower Bound on the Cost of Sorting 7

Before considering how to improve on Insertion Sort, consider the question: How fast is it possible to sort?

Now, “fast” here must refer to algorithmic complexity, not time. We will consider the number of comparisons of elements a sorting algorithm must make in order to fully sort a list.

Note that this is an extremely broad issue since we seek an answer of the form: any sorting algorithm, no matter how it works, must, on average, perform at least Θ(f(N)) comparisons when sorting a list of N elements.

Thus, we cannot simply consider any particular sorting algorithm…

Sorting

Possible Orderings of N Elements 8

A bit of combinatorics (the mathematics of counting)… Given a collection of N distinct objects, the number of different ways to line them up in a row is N!.

Thus, a sorting algorithm must, in the worst case, determine the correct ordering among N! possible results.

If the algorithm compares two elements, the result of the comparison eliminates certain orderings as the final answer, and directs the “search” to the remaining, possible orderings.

We may represent the process of comparison sorting with a binary tree…

Sorting

Data Structures & OO Development II

Comparison Trees 9

A comparison tree is a binary tree in which each internal node represents the comparison of two particular elements of a set, and the edges represent the two possible outcomes of that comparison.

For example, given the set A = {a, b, c} a sorting algorithm will begin by comparing two of the elements (it doesn’t matter which two, so we’ll choose arbitrarily):

a < b?

a ≤ b ≤ c a ≤ c ≤ b T c^ ≤^ a^ ≤^ b

F (^) b ≤ a ≤ c

b ≤ c ≤ a c ≤ b ≤ a

Sorting

Comparison Trees 10

For example, given the set A = {a, b, c} a sorting algorithm will begin by comparing two of the elements (it doesn’t matter which two, so we’ll choose arbitrarily):

a < b?

a < b < c a < c < b c < a < b

T

F b < a < c b < c < a c < b < a

c < a?

c < b?

T

F

c < a < b

a < b < c a < c < b c < b?

T

F a < b < c

a < c < b

c < b < a

c < a?

T

F b < a < c

b < c < a b < a < c b < c < a

Sorting

Data Structures & OO Development II

Simplified Lower Bound 13

Using Stirling’s Formula, and changing bases, we have that:

So, log(n!) is Θ(n log n).

For most practical purposes, only the first term matters here, so you will often see the assertion that the lower bound for comparisons in sorting is n log n.

( )

log

log()

log 2

log log()

log(!)

n n n n

n n

n

e

n n n en π

Sorting

Improving Insertion Sort 14

Insertion sort is most efficient when the initial position of each list element is fairly close to its final position (take another look at the analysis). Consider: 10 8 6 20 4 3 22 1 0 15 16

Pick a step size ( here) and logically break the list into parts.

Sort the elements in each part. Insertion Sort is acceptable since it's efficient on short lists.

Sorting

Data Structures & OO Development II

Improving Insertion Sort 15

This gives us:

Now we pick a smaller increment, 3 here, and repeat the process:

Partition list: 3 0 22 15

Sort the parts:

Giving: (^0413610158202216)

Finally repeat the process with step size 1:

Sorting

Why Is This Better? 16

Well…

Until the last pass, the sublists that are being sorted are much shorter than the entire list — sorting two lists of length 1000 is faster than sorting one list of length 2000.
Since the sublists exhibit mixing as we change the step size, the effect of the early passes is to move each element closer to its final position in the fully sorted list.
In the last pass, most of the elements are probably not too far from their final positions, and that's one situation where Insertion Sort is quite efficient.

QTP: Suppose than a sorting algorithm is, on average, Θ (N^2 ). Using that fact, how would the expected time to sort a list of length 50 compare to the time required to sort a list of length 100?

Sorting

Data Structures & OO Development II

Divide and Conquer Sorting 19

Shell Sort represents a "divide-and-conquer" approach to the problem. That is, we break a large problem into smaller parts (which are presumably more manageable), handle each part, and then somehow recombine the separate results to achieve a final solution.

In Shell Sort, the recombination is achieved by decreasing the step size to 1, and physically keeping the sublists within the original list structure. Note that Shell Sort is better suited to a contiguous list than a linked list.

We will now consider a somewhat similar algorithm that retains the divide and conquer strategy, but which is better suited to linked lists.

Sorting

Merge Sort 20

In Merge Sort, we chop the list into two (or more) sublists which are as nearly equal in size as we can achieve, sort each sublist separately, and then carefully merge the resulting sorted sublists to achieve a final sorting of the original list.

3 6 •

1 7 •

4 •

Head (^25 9 7 )

3 8 6

Merging two sorted lists into a single sorted list is relatively trivial:

S1 4

S

L

Sorting

Data Structures & OO Development II

Merge Sort Implementation 21

template void List::MergeSort() {

MergeSortHelper(List.Head); }

template void List::MergeSortHelper(NodeT*& sHead) {

if ( (sHead != NULL) && (sHead->Next != NULL) ) {

Node* SecondHalf = DivideFrom(sHead);

MergeSortHelper(sHead); MergeSortHelper(SecondHalf);

sHead = Merge(sHead, SecondHalf); } }

The public interface function is just a shell to call a recursive helper function:

partition list

sort each half merge the pieces

In turn, the helper function uses two other private helper functions…

Sorting

Merge Sort Partition 22

template Node* List::DivideFrom(Node* sHead) {

Node *Position, *MidPoint = sHead, *SecondHalf; if (MidPoint == NULL) return NULL;

Position = MidPoint->Next; while (Position != NULL) { Position = Position->Next; if (Position != NULL) { MidPoint = MidPoint->Next; Position = Position->Next; } } SecondHalf = MidPoint->Next; MidPoint->Next = NULL;

return SecondHalf; }

The partition divides the list nodes as evenly as possible:

Sublist is empty, so quit…

Walk Position to the end of the list, moving MidPoint at half the speed of Partition, so MidPoint winds up at the middle of the list.

Get head of second half of list.

Break sublist at its middle.

Sorting

Data Structures & OO Development II

Merge Sort Performance 25

All the element comparisons take place during the merge phase. Logically, we may consider this as if the algorithm re-merged each level before proceeding to the next:

4 8 9 10 14 19 23 32

8 14 23 32 4 9 10 19

14 32 8 23 4 9 10 19

14 32 8 23 4 9 19 10

So merging the sublists involves log N passes. On each pass, each list element is used in (at most) one comparison, so the number of element comparisons per pass is N. Hence, the number of comparisons for Merge Sort is Θ( N log N ).

Sorting

Merge Sort Summary 26

Merge Sort comes very close to the theoretical optimum number of comparisons. A closer analysis shows that for a list of N elements, the average number of element comparisons using Merge Sort is actually:

Θ( N log N )− 1. 1583 N + 1

Recall that the theoretical minimum is: (^) N log N − 1. 44 N +Θ( 1 )

For a linked list, Merge Sort is the sorting algorithm of choice, providing nearly optimal comparisons and requiring NO element assignments (although there is a considerable amount of pointer manipulation), and requiring NO significant additional storage.

For a contiguous list, Merge Sort would require either using Θ(N) additional storage for the sublists or using a considerably complex algorithm to achieve the merge with a small amount of additional storage.

Sorting

Data Structures & OO Development II

Heap Sort 27

A list can be sorted by first building it into a heap, and then iteratively deleting the root node from the heap until the heap is empty. If the deleted roots are stored in reverse order in an array they will be sorted in ascending order (if a max heap is used).

template void HeapSort(T* List, unsigned int Size) { HeapT toSort(List, Size); toSort.BuildHeap();

unsigned int Idx = Size - 1; while ( !toSort.isEmpty() ) { List[Idx] = toSort.RemoveRoot(); Idx--; } }

Sorting

Cost of Deleting the Roots 28

Recalling the earlier analysis of building a heap, level k of a full and complete binary tree will contain 2k^ nodes, and that those nodes are k levels below the root level. So, when the root is deleted the maximum number of levels the swapped node can sift down is the number of the level from which that node was swapped. Thus, in the worst case, for deleting all the roots…

[ ( ) ]

N  N   N  N

k k d d

d

k

d

k

2 log 2 log 4

Comparisons 2 2 4 2 4 22 1 1

1

−

=

−

=

As usual, with Heap Sort, this would entail half as many element swaps.

Sorting

Data Structures & OO Development II

Importance of the Pivot Value 31

The choice of the pivot value is crucial to the performance of QuickSort. Ideally, the partitioning step produces two equal sublists, as here:

In the worst case, the partitioning step produces one empty sublist, as here:

Theoretically, the ideal pivot value is the median of the values in the sublist; unfortunately, finding the median is too expensive to be practical here.

43 71 87 14 53 38 90 51 41

Sorting

Choosing the Pivot Value 32

take the value at some fixed position (first, middle-most, last, etc.)
take the median value among the first, last and middle-most values
find three distinct values and take the median of those

Commonly used alternatives to finding the median are:

The third does not guarantee good performance, but is the best of the listed strategies since it is the only one that guarantees two nonempty sublists. (Of course, if you can’t find three distinct values, this doesn’t work, but in that case the current sublist doesn’t require any fancy sorting — a quick swap will finish it off efficiently.)

Each of the given strategies for finding the pivot is Θ(1) in comparisons.

QTP: under what conditions would choosing the first value produce consistently terrible partitions?

Sorting

Data Structures & OO Development II

Partitioning a Sublist Efficiently 33

Since each iteration of QuickSort requires partitioning a sublist, this must be done efficiently. Fortunately, there is a simple algorithm for partitioning a sublist of N elements that is Θ(N) in comparisons and assignments:

template unsigned int Partition(T List[], unsigned int Lo, unsigned int Hi ) {

T Pivot; unsigned int Idx, LastPreceder; Swap(List[Lo], List[(Lo + Hi)/2]); // take middle-most element as Pivot Pivot = List[Lo]; // move it to the front of the list LastPreceder = Lo; for (Idx = Lo + 1; Idx <= Hi; Idx++) { if (List[Idx] < Pivot) { LastPreceder++; Swap(List[LastPreceder], List[Idx]); } } Swap(List[Lo], List[LastPreceder]); return LastPreceder; }

Sorting

QuickSort Implementation 34

Assuming the pivot and partition function just described, the main QuickSort function is quite trivial:

template void QuickSort(T List[], unsigned int Lo, unsigned int Hi) {

QuickSortHelper(List, Lo, Hi); }

template void QuickSortHelper(T List[], unsigned int Lo, unsigned int Hi) {

unsigned int PivotIndex; if (Lo < Hi) { PivotIndex = Partition(List, Lo, Hi); QuickSortHelper(List, Lo, PivotIndex - 1); // recurse on lower part, QuickSortHelper(List, PivotIndex + 1, Hi); // then on higher part } }

Sorting

Data Structures & OO Development II

QuickSort Performance Worst Case 37

Suppose r = 0 ; i. e., that QuickSort produces one empty sublist and merely splits off the pivot value from the remaining N – 1 elements. Then we have: C(1) = 0 C(2) = 1 + C(1) = 1

C(3) = 2 + C(2) = 2 + 1 C(4) = 3 + C(3) = 3 + 2 + 1

... C(N) = N – 1 + C(N – 1) = (N – 1) + (N – 2) + … + 1

= 0.5N 2 – 0.5N This is the worst case, and is as bad as the worst case for selection sort. Similar analysis shows that the number of swaps in this case is: S(N) = 0.5N^2 + 1.5N – 1

which is Θ(1.5N^2 ) element assignments, even worse than insertion sort.

Sorting

QuickSort Performance Average Case 38

For the average case, we will consider all possible results of the partitioning phase and compute the average of those. We assume that all possible orderings of the list elements are equally likely to be the correct sorted order, and let p denote the pivot value chosen.

Thus, after partitioning, recalling we assume the key values are 1…N, the values 1, … p – 1 are to the left of p, and the values p + 1, … N are to the right of p.

Let S(N) be the average number of swaps made by QuickSort on a list of length N, and S(N, p) be the average number of swaps if the value p is chosen as the initial pivot.

The partition implementation given here will perform p – 1 swaps within the loop, and two more outside the loop. Therefore…

Sorting

Data Structures & OO Development II

QuickSort Performance Average Case 39

Now there are N possible choices for p (1…N), and so if we sum this expression from p = 1 to p = N and divide by N we have:

S ( N , p )= p + 1 + S ( p − 1 )+ S ( N − p )

[ ( 0 ) ( 1 ) ( 1 )]

( )= + + S + S + + S N −

N

S N "

That’s another recurrence relation… we may solve this by playing a clever trick; first note that if QuickSort were applied to a list of N – 1 elements, we’d have:

[ ( 0 ) ( 1 ) ( 2 )]

− = S S S N

N

S N "

(A)

(B)

Sorting

QuickSort Performance Average Case 40

If we multiply both sides of equation (A) by N, and multiply both sides of equation (B) by N – 1, and then subtract, we can obtain:

NS ( N )− ( N − 1 ) S ( N − 1 )= N + 1 + 2 S ( N − 1 )

Rearranging terms, we get:

N N

S N

N

S N ( 1 ) 1

(C)

(D)

A closed-form solution for this recurrence relation can be guessed by writing down the first few terms of the sequence. Doing so, we obtain…

Comparison of Sorting Algorithms: Insertion, Merge, Heap, Quick, Radix, Bin Sort, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Comparison of Sorting Algorithms: Insertion, Merge, Heap, Quick, Radix, Bin Sort and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Sorting Considerations 1

Internal or External? 2

Insertion Sort 3

Insertion Sort Average Comparisons 4

= +^ −

 +^

N N

K N K N N

∑ − + = −

K

K

K K

K

K

K j

K

Lower Bound on the Cost of Sorting 7

Possible Orderings of N Elements 8

Comparison Trees 9

Comparison Trees 10

Simplified Lower Bound 13

log

log

log

log()

log 2

log log()

log(!)

n n n n

n n

n

e

Improving Insertion Sort 14

Improving Insertion Sort 15

Why Is This Better? 16

Divide and Conquer Sorting 19

Merge Sort 20

Merge Sort Implementation 21

Merge Sort Partition 22

Merge Sort Performance 25

Merge Sort Summary 26

Heap Sort 27

Cost of Deleting the Roots 28

[ ( ) ]

N  N   N  N

k k d d

2 log 2 log 4

Comparisons 2 2 4 2 4 22 1 1

Importance of the Pivot Value 31

Choosing the Pivot Value 32

Partitioning a Sublist Efficiently 33

QuickSort Implementation 34

QuickSort Performance Worst Case 37

QuickSort Performance Average Case 38

QuickSort Performance Average Case 39

S ( N , p )= p + 1 + S ( p − 1 )+ S ( N − p )

( )= + + S + S + + S N −

N

N

S N "

− = S S S N

N

N

S N "

(A)

(B)

QuickSort Performance Average Case 40

NS ( N )− ( N − 1 ) S ( N − 1 )= N + 1 + 2 S ( N − 1 )

N N

S N

N

S N ( 1 ) 1

(C)

(D)