Introduction to Algorithms and Data Structures: A Comprehensive Guide, Thesis of Project Management

A comprehensive introduction to algorithms and data structures, covering fundamental concepts, analysis techniques, and common algorithms like merge sort, heap sort, and quicksort. It delves into asymptotic notations, recurrence relations, and the master theorem for analyzing algorithm efficiency. The document also explores non-comparison-based sorting algorithms like counting sort, radix sort, and bucket sort, illustrating their principles and applications.

Typology: Thesis

2023/2024

Uploaded on 10/24/2024

shanthi_48
shanthi_48 🇺🇸

4.8

(36)

891 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Introduction to Algorithms and
Data Structures
Introduction to Algorithms
1.1 Algorithm
An algorithm is any well-defined computational procedure that takes some
value, or set of values, as input and produces some value, or set of values, as
output. An algorithm is thus a sequence of computational steps that
transform the input into the output. For example, given the input sequence
{31, 41, 59, 26, 41, 58}, a sorting algorithm returns as output the sequence
{26, 31, 41, 41, 58, 59}. Such an input sequence is called an instance of the
sorting problem.
Instance: An instance of a problem consists of the input needed to compute
a solution to the problem. An algorithm is said to be correct if, for every
input instance, it halts with the correct output. There are two aspects of
algorithmic performance:
Time
What kind of data structures can be used?
How does choice of data structure affect the runtime?
1.1.1 Analysis of Algorithms
Analysis is performed with respect to a computational model:
We will usually use a generic uniprocessor random-access machine
(RAM)
All memory equally expensive to access
No concurrent operations
All reasonable instructions take unit time, except for function calls
Constant word size, unless we are explicitly manipulating bits
Input Size: - Time and space complexity are generally a function of the
input size. - How we characterize input size depends on the problem: -
Sorting: number of input items - Multiplication: total number of bits - Graph
algorithms: number of nodes & edges
Running Time: - Number of primitive steps that are executed, except for
the time of executing a function call, as most statements roughly require the
same amount of time.
Analysis: - Worst case: Provides an upper bound on running time, an
absolute guarantee. - Average case: Provides the expected running time, but
treat with care as the definition of "average" can be ambiguous.
pf3
pf4
pf5
pf8

Partial preview of the text

Download Introduction to Algorithms and Data Structures: A Comprehensive Guide and more Thesis Project Management in PDF only on Docsity!

Introduction to Algorithms and

Data Structures

Introduction to Algorithms

1.1 Algorithm

An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output. An algorithm is thus a sequence of computational steps that transform the input into the output. For example, given the input sequence {31, 41, 59, 26, 41, 58}, a sorting algorithm returns as output the sequence {26, 31, 41, 41, 58, 59}. Such an input sequence is called an instance of the sorting problem.

Instance : An instance of a problem consists of the input needed to compute a solution to the problem. An algorithm is said to be correct if, for every input instance, it halts with the correct output. There are two aspects of algorithmic performance:

Time What kind of data structures can be used? How does choice of data structure affect the runtime?

1.1.1 Analysis of Algorithms

Analysis is performed with respect to a computational model:

We will usually use a generic uniprocessor random-access machine (RAM) All memory equally expensive to access No concurrent operations All reasonable instructions take unit time, except for function calls Constant word size, unless we are explicitly manipulating bits

Input Size : - Time and space complexity are generally a function of the input size. - How we characterize input size depends on the problem: - Sorting: number of input items - Multiplication: total number of bits - Graph algorithms: number of nodes & edges

Running Time : - Number of primitive steps that are executed, except for the time of executing a function call, as most statements roughly require the same amount of time.

Analysis : - Worst case: Provides an upper bound on running time, an absolute guarantee. - Average case: Provides the expected running time, but treat with care as the definition of "average" can be ambiguous.

1.1.2 Analyzing Algorithms

Example: Insertion Sort

InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } }

Analysis : - The running time T(n) is a quadratic function in the worst case, where the inner loop body is executed for all previous elements. - In the best case, the inner loop body is never executed, and the running time is a linear function. - The average case is often roughly as bad as the worst case.

1.2 Merge Sort

The merge sort algorithm closely follows the divide-and-conquer paradigm, which involves three steps at each level of the recursion:

Divide the problem into a number of subproblems that are smaller instances of the same problem. Conquer the subproblems by solving them recursively. Combine the solutions to the subproblems into the solution for the original problem.

The key operation of the merge sort algorithm is the merging of two sorted sequences in the "combine" step. The MERGE procedure takes time Θ(n), where n is the total number of elements being merged.

1.2.1 Analysis of Merge Sort

The recurrence for the worst-case running time T(n) of merge sort is:

T(n) = { Θ(1) if n = 1, 2T(n/2) + Θ(n) if n > 1. }

The solution for the above recurrence is Θ(n log n).

1.3 Growth of Functions

The notations we use to describe the asymptotic running time of an algorithm are defined in terms of functions whose domains are the set of natural numbers N = {0, 1, 2, ...}.

1.3.1 Asymptotic Notations

1.3.1.1 Upper Bound Notation or O-notation

We say that a function f(n) is O(g(n)) if there exist positive constants c and n0 such that f(n) ≤ c × g(n) for all n ≥ n0.

Formally: O(g(n)) = {f(n): ∃ positive constants c and n0 such that f(n) ≤ c × g(n) for all n ≥ n0}

As an example, let's determine an upper bound on the recurrence T(n) = 2T(⌈n/2⌉) using the substitution method. We guess that the solution is T(n) = O(n log n).

1.4.1.2 Changing Variables

Sometimes, a little algebraic manipulation can make an unknown recurrence similar to one you have seen before. As an example, consider the recurrence T(2^m) = 2T(2^(m/2)) + Θ(2^m). We can simplify this recurrence by renaming m = log n, which yields the new recurrence S(m) = S(m/2) + Θ(m). The solution is S(m) = Θ(m log m), which translates to T(n) = Θ(log n log log n).

The Recursion-Tree Method

The recursion-tree method provides a way to devise a good guess for the solution to a recurrence when the substitution method is not coming up with a good guess. The key idea is to draw out a recursion tree, where each node represents the cost of a single subproblem, and then sum the costs within each level of the tree to obtain the total cost.

Divide and Conquer Algorithms

The Master Theorem

The Master Theorem provides a cookbook for determining the running time of divide and conquer algorithms. For an algorithm that divides the problem of size n into a subproblems, each of size n/b, where a ≥ 1 and b > 1 are constants, and the cost of each stage (i.e., the work to divide the problem and combine solved subproblems) is described by the function f(n), the Master Theorem gives us the following:

If f(n) = O(n^(log_b a - ε)) for some constant ε > 0 , then T(n) = Θ(n^(log_b a)). If f(n) = Θ(n^(log_b a)), then T(n) = Θ(n^(log_b a) * log n). If f(n) = Ω(n^(log_b a + ε)) for some constant ε > 0 , and if a * T(n/b) ≤ c * f(n) for some constant c < 1 and all sufficiently large n, then T(n) = Θ(f(n)).

Intuitively, the larger of the two functions n^(log_b a) and f(n) determines the solution to the recurrence.

Examples

Recurrence: T(n) = 9T(n/3) + n a = 9 , b = 3 , f(n) = n n^(log_b a) = n^(log_3 9) = n^ Since f(n) = O(n^(log_b a - ε)), where ε = 1 , case 1 applies.

Therefore, T(n) = Θ(n^(log_b a)) = Θ(n^2).

Recurrence: T(n) = T(2n/3) + 1

a = 1 , b = 3/2, f(n) = 1 n^(log_b a) = n^(log_3/2 1) = n^0 = 1 Since f(n) = Θ(n^(log_b a)), case 2 applies.

Therefore, T(n) = Θ(n^(log_b a) * log n) = Θ(log n).

Recurrence: T(n) = 2T(n/2) + n log n

a = 2 , b = 2 , f(n) = n log n n^(log_b a) = n^(log_2 2) = n Since f(n) is larger than n^(log_b a), you might mistakenly apply case 3. However, f(n) is larger than n^(log_b a) but not polynomially larger, and the regularity condition in case 3 fails. Consequently, this recurrence falls into the gap between case 2 and case 3, and the Master Theorem does not apply. In this case, the solution must be obtained using other methods, such as the iteration method.

Iteration Method

The iteration method involves expanding the recurrence by using iterative equations, performing algebraic manipulations to express it as a summation, and then evaluating the summation.

Heap Sort

Heap sort is a comparison-based sorting algorithm that uses a binary heap data structure. It consists of two main steps:

Building a Max-Heap : The BUILD-MAX-HEAP procedure builds a max- heap from the input array in linear time, O(n). Heap Sort : The HEAPSORT procedure repeatedly extracts the maximum element (the root of the max-heap) and places it at the end of the sorted array, while maintaining the max-heap property.

The time complexity of heap sort is O(n log n).

Quick Sort

Quicksort is a divide-and-conquer algorithm that works as follows:

Divide : The array A[p..r] is partitioned into two non-empty subarrays A[p..q] and A[q+1..r] such that all elements in A[p..q] are less than all elements in A[q+1..r]. Conquer : The subarrays A[p..q] and A[q+1..r] are recursively sorted. Combine : No combining step is needed, as the two sorted subarrays form the already-sorted array.

Counting Sort

Running Time of Counting Sort

The time complexity of Counting Sort can be analyzed as follows:

The for loop of lines 1-2 takes time Θ(k), where k is the range of the input elements. The for loop of lines 3-4 takes time Θ(n), where n is the length of the input array. The for loop of lines 6-7 takes time Θ(k). The for loop of lines 9-11 takes time Θ(n).

Therefore, the overall time complexity of Counting Sort is Θ(k+n). In practice, Counting Sort is usually used when k = O(n), in which case the running time becomes Θ(n).

The operation of Counting Sort on an input array A[1,...,8], where each element of A is a nonnegative integer no larger than k = 5, is illustrated in the following figure:

Figure (a) shows the array A and the auxiliary array C after line 4. Figure (b) shows the array C after line 7. Figures (c)-(e) show the output array B and the auxiliary array C after one, two, and three iterations of the loop in lines 9-11, respectively. Only the lightly shaded elements of array B have been filled in. Figure (f) shows the final sorted output array B.

Radix Sort

Radix Sort solves the problem of card sorting by sorting on the least significant digit first. The process continues until the cards have been sorted on all d digits. The algorithm is as follows:

RadixSort(A, d) for i=1 to d StableSort(A) on digit i

Given n d-digit numbers in which each digit can take on up to k possible values, RADIXSORT correctly sorts these numbers in Θ(d(n + k)) time.

The operation of Radix Sort on a list of seven 3-digit numbers is illustrated in the following figure. The remaining columns show the list after successive sorts on increasingly significant digit positions.

Bucket Sort

Bucket Sort is based on the assumption that the input is n reals from the interval [0, 1). The basic idea is to create n linked lists (buckets) to divide the interval [0,1) into subintervals of size 1/n. Each input element is then added to the appropriate bucket, and the buckets are sorted using Insertion Sort.

The algorithm is as follows:

BUCKET-SORT(A) n ← length[A] for i ← 1 to n do insert A[i] into list B[⌊nA[i]⌋] for i ← 0 to n - 1 do sort list B[i] with insertion sort concatenate the lists B[0], B[1], ..., B[n - 1] together in order

If the input has a uniform distribution, the expected size of each bucket is O(1), and the expected total time is O(n).

The operation of Bucket Sort for n = 10 is illustrated in the following figure:

Figure (a) shows the input array A(1,...,10). Figure (b) shows the array B(0,...,9) of sorted lists (buckets) after line 8 of the algorithm. Bucket i holds values in the half-open interval [i/10, (i

  • 1)/10). The sorted output consists of a concatenation in order of the lists B[0], B[1],...,B[9].

To analyze the running time, observe that all lines except line 5 take O(n) time in the worst case.