Data structures and algorithms using c++ programming language, Study notes of Data Structures and Algorithms

This document is one of my 3rd year computer science course.

Typology: Study notes

2020/2021

Uploaded on 04/11/2022

Mila940000
Mila940000 🇪🇹

5

(1)

4 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MLITC Dep of CS DS and Algorithm Analysis
Compiled by: Alula T. 1
Chapter One
1. Introduction
1.1. C++ Review
C++ is an object oriented programming language that is derived from a language called
C. It has many language constructs that are available to the programmer. In this course
we will use two of them: Variables, pointers and structure.
1.1.1. Variables
Before going to the definition of variables, let us relate them to old mathematical
equations. All of us have solved many mathematical equations since childhood.
As an example, consider the below equation:
We don’t have to worry about the use of this equation. The important thing that we
need to understand is that the equation has names (x and y), which hold values (data).
That means the names (x and y) are placeholders for representing data. Similarly, in
computer science programming we need something for holding data, and variables is
the way to do that.
1.1.2. Data Types
In the above-mentioned equation, the variables x and y can take any values such as integral
numbers (10, 20), real numbers (0.23, 5.5), or just 0 and 1. To solve the equation, we need to
relate them to the kind of values they can take, and data type is the name used in computer
science programming for this purpose. A data type in a programming language is a set of data
with predefined values. Examples of data types are: integer, floating point, unit number,
character, string, etc.
Computer memory is all filled with zeros and ones. If we have a problem and we want to code
it, it’s very difficult to provide the solution in terms of zeros and ones. To help users,
programming languages and compilers provide us with data types. For example, integer takes 2
bytes (actual value depends on compiler), float takes 4 bytes, etc. This says that in memory we
are combining 2 bytes (16 bits) and calling it an integer. Similarly, combining 4 bytes (32 bits)
and calling it a float. A data type reduces the coding effort. At the top level, there are two types
of data types:
System-defined data types (also called Primitive data types)
User-defined data types
1.1.2.1. System-defined data types (Primitive data types)
Data types that are defined by system are called primitive data types. The primitive data types
provided by many programming languages are: int, float, char, double, bool, etc. The number of
bits allocated for each primitive data type depends on the programming languages, the compiler
and the operating system. For the same primitive data type, different languages may use
different sizes. Depending on the size of the data types, the total available values (domain) will
also change.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Data structures and algorithms using c++ programming language and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Chapter One

1. Introduction

1.1. C++ Review

C++ is an object oriented programming language that is derived from a language called C. It has many language constructs that are available to the programmer. In this course we will use two of them: Variables, pointers and structure.

1.1.1. Variables

Before going to the definition of variables, let us relate them to old mathematical equations. All of us have solved many mathematical equations since childhood. As an example, consider the below equation:

We don’t have to worry about the use of this equation. The important thing that we need to understand is that the equation has names (x and y), which hold values (data). That means the names (x and y) are placeholders for representing data. Similarly, in computer science programming we need something for holding data, and variables is the way to do that.

1.1.2. Data Types

In the above-mentioned equation, the variables x and y can take any values such as integral numbers (10, 20), real numbers (0.23, 5.5), or just 0 and 1. To solve the equation, we need to relate them to the kind of values they can take, and data type is the name used in computer science programming for this purpose. A data type in a programming language is a set of data with predefined values. Examples of data types are: integer, floating point, unit number, character, string, etc.

Computer memory is all filled with zeros and ones. If we have a problem and we want to code it, it’s very difficult to provide the solution in terms of zeros and ones. To help users, programming languages and compilers provide us with data types. For example, integer takes 2 bytes (actual value depends on compiler), float takes 4 bytes, etc. This says that in memory we are combining 2 bytes (16 bits) and calling it an integer. Similarly, combining 4 bytes (32 bits) and calling it a float. A data type reduces the coding effort. At the top level, there are two types of data types:

 System-defined data types (also called Primitive data types)

 User-defined data types

1.1.2.1. System-defined data types (Primitive data types) Data types that are defined by system are called primitive data types. The primitive data types provided by many programming languages are: int, float, char, double, bool, etc. The number of bits allocated for each primitive data type depends on the programming languages, the compiler and the operating system. For the same primitive data type, different languages may use different sizes. Depending on the size of the data types, the total available values (domain) will also change.

For example, “int” may take 2 bytes or 4 bytes. If it takes 2 bytes (16 bits), then the total possible values are minus 32,768 to plus 32,767 (-2^15 to 2^15 -1). If it takes 4 bytes (32 bits), then the possible values are between -2,147,483,648 and +2,147,483,647 (-2^31 to 2^31 -1). The same is the case with other data types.

1.1.2.2. User defined data types If the system-defined data types are not enough, then most programming languages allow the users to define their own data types, called user – defined data types. Good examples of user defined data types are: structures in C/C + + and classes in Java. For example, in the snippet below, we are combining many system-defined data types and calling the user defined data type by the name “newType”. This gives more flexibility and comfort in dealing with computer memory.

1.1.3. Pointer

A pointer is a variable that holds the address of other variables. Syntax to declare a pointer *type *pointer_name;* To get the address of a variable we use the & (ampersand) operator. *e.g. int a = 20; int p1 = &a; One of the usages of pointers is for dynamic memory allocation. Dynamic memory allocation allows us to allocate memory at run time. Allocating memory dynamically increases efficiency because only the required memory will be allocated. There is no memory shortage or wastage. To allocate memory dynamically, we use new operator.

Exercise

  1. A C++ program that accepts age of students and displays it in sorted order. Use a dynamic array. Your program should not allow negative values.
  2. A C++ program that accepts score of n students and calculates the minimum, maximum and average of the score.

1.1.4. Structures

A structure is a means of grouping different variables so that they can be used as a single entity in our program. By using structures, the programmer can define his own data type. Syntax to define a structure struct structure_Name {

1.3. Abstract Data Types (ADTs)

Given a problem, the first step to solve the problem is obtaining one’s own abstract view, or model , of the problem. This process of modeling is called abstraction.

The model defines an abstract view to the problem. This implies that the model focuses only on problem related stuff and that a programmer tries to define the properties of the problem. These properties include:  The data which are affected and  The operations that are involved in the problem. With abstraction you create a well-defined entity that can be properly handled. These entities define the data structure of the program. An entity with the properties just described is called an abstract data type (ADT). An ADT consists of an abstract data structure and operations. Put in other terms, an ADT is an abstraction of a data structure.

The ADT specifies:

1. What can be stored in the Abstract Data Type 2. What operations can be done on/by the Abstract Data Type?

For example, if we are going to model employees of an organization:  This ADT stores employees with their relevant attributes and discarding irrelevant attributes.  This ADT supports hiring, firing, retiring, operations. A data structure is a language construct that the programmer has defined in order to implement an abstract data type. There are lots of formalized and standard Abstract data types such as Stacks, Queues, Trees, etc. Do all characteristics need to be modeled? Not at all because:  It depends on the scope of the model  It depends on the reason for developing the model Abstraction is a process of classifying characteristics as relevant and irrelevant for the particular purpose at hand and ignoring the irrelevant ones. Applying abstraction correctly is the essence of successful programming How do data structures model the world or some part of the world?  The value held by a data structure represents some specific characteristic of the world  The characteristic being modeled restricts the possible values held by a data structure  The characteristic being modeled restricts the possible operations to be performed on the data structure.

Exercise

 Arrays and linked lists are basic data structures that are used as a building block for other complex data structures like stack and queue. List the advantages and disadvantages of these two basic data structures.

1.4. Algorithms

An algorithm is a well-defined computational procedure that takes some value or a set of values as input and produces some value or a set of values as output. Data structures model the static part of the world. They are unchanging while the world is changing. In order to model the dynamic part of the world we need to work with algorithms. Algorithms are the dynamic part of a program’s world model. An algorithm transforms data structures from one state to another state in two ways:  An algorithm may change the value held by a data structure  An algorithm may change the data structure itself The quality of a data structure is related to its ability to successfully model the characteristics of the world. Similarly, the quality of an algorithm is related to its ability to successfully simulate the changes in the world. However, independent of any particular world model, the quality of data structure and algorithms is determined by their ability to work together well. Generally speaking, correct data structures lead to simple and efficient algorithms and correct algorithms lead to accurate and efficient data structures.

1.4.1. Properties of an algorithm

  • Finiteness : Algorithm must complete after a finite number of steps.
  • Definiteness : Each step must be clearly defined, having one and only one interpretation. At each point in computation, one should be able to tell exactly what happens next.
  • Sequence : Each step must have a unique defined preceding and succeeding step. The first step (start step) and last step (halt step) must be clearly noted.
  • Feasibility : It must be possible to perform each instruction.
  • Correctness : It must compute correct answer all possible legal inputs.
  • Language Independence : It must not depend on any one programming language.
  • Completeness : It must solve the problem completely.
  • Effectiveness : It must be possible to perform each step exactly and in a finite amount of time.
  • Efficiency : It must solve with the least amount of computational resources such as time and space.
  • Generality: Algorithm should be valid on all possible inputs.
  • Input/Output: There must be a specified number of input values, and one or more result values.

Usually algorithms are written using pseudo code. In pseudo code we use arithmetic operations, assignment, if, while and other loop statement like high level languages. But also some operations are written in natural language. Since it is a combination of natural language and elements of high level languages, it is a false code, hence the name pseudo code.

There are two approaches to measure the efficiency of algorithms:

  • Empirical : Programming competing algorithms and trying them on different instances.
  • Theoretical : Determining the quantity of resources required mathematically (Execution time, memory space, etc.) needed by each algorithm.

However, it is difficult to use actual clock-time as a consistent measure of an algorithm’s efficiency, because clock-time can vary based on many things. For example,  Specific processor speed  Current processor load  Specific data for a particular run of the program o Input Size o Input Properties  Operating Environment Accordingly, we can analyze an algorithm according to the number of operations required, rather than according to an absolute amount of time involved. This can show how an algorithm’s efficiency changes according to the size of the input.

Why the Analysis of Algorithms? To go from city “A” to city “B” , there can be many ways of accomplishing this: by flight, by bus, by train and also by bicycle. Depending on the availability and convenience, we choose the one that suits us. Similarly, in computer science, multiple algorithms are available for solving the same problem (for example, a sorting problem has many algorithms, like insertion sort, selection sort, quick sort and many more). Algorithm analysis helps us to determine which algorithm is most efficient in terms of time and space consumed.

Goal of the Analysis of Algorithms: The goal of the analysis of algorithms is to compare algorithms (or solutions) mainly in terms of running time but also in terms of other factors (e.g., memory, developer effort, etc.)

1.4.3. Complexity Analysis

Complexity Analysis is the systematic study of the cost of computation, measured either in time units or in operations performed, or in the amount of storage space required. The goal is to have a meaningful measure that permits comparison of algorithms independent of operating platform. There are two things to consider:  Time Complexity : Determine the approximate number of operations required to solve a problem of size n.  Space Complexity: Determine the approximate memory required to solve a problem of size n.

The factor of time is more important than space. Complexity analysis involves two distinct phases:  Algorithm Analysis : Analysis of the algorithm or data structure to produce a function T (n) that describes the algorithm in terms of the operations performed in order to measure the complexity of the algorithm.  Order of Magnitude Analysis : Analysis of the function T(n) to determine the general complexity category to which it belongs.

There is no generally accepted set of rules for algorithm analysis. However, an exact count of operations is commonly used.

1.4.4. What is Rate of Growth?

The rate at which the running time increases as a function of input is called rate of growth. Let us assume that you go to a shop to buy a car and a bicycle. If your friend sees you there and asks what you are buying, then in general you say buying a car. This is because the cost of the car is high compared to the cost of the bicycle (approximating the cost of the bicycle to the cost of the car).

For the above-mentioned example, we can represent the cost of the car and the cost of the bicycle in terms of function, and for a given function ignore the low order terms that are relatively insignificant (for large value of input size, n ). As an example, in the case below, n^4 , 2n^2 , 100n and 500 are the individual costs of some function and approximate to n^4 since n^4 is the highest rate of growth.

Commonly Used Rates of Growth: The diagram below shows the relationship between different rates of growth. And above the diagram provide is the list of growth rates that we will use later on this chapter.

1.4.5. Types of Analysis

To analyze the given algorithm, we need to know with which inputs the algorithm takes less time (performing wel1) and with which inputs the algorithm takes a long time. We have already seen that an algorithm can be represented in the form of an expression. That means we represent the algorithm with multiple expressions: one for the case where it takes less time and another for the case where it takes more time. In general, the first case is called the best case and the second case is called the worst case for the algorithm. To analyze an algorithm we need some kind of syntax, and that forms the base for asymptotic analysis/notation. There are three types of analysis:

  • Worst case

 Defines the input for which the algorithm takes a long time (slowest time to complete).  Input is the one for which the algorithm runs the slowest.

  • Best case

 Defines the input for which the algorithm takes the least time (fastest time to complete).  Input is the one for which the algorithm runs the fastest.

  • Average case

 Provides a prediction about the running time of the algorithm.  Run the algorithm many times, using many different inputs that come from some distribution that generates these inputs, compute the total running time (by adding the individual times), and divide by the number of trials.  Assumes that the input is random. Lower Bound <= Average Time <= Upper Bound

For a given algorithm, we can represent the best, worst and average cases in the form of expressions. As an example, let f ( n ) be the function which represents the given algorithm.

Similarly for the average case. The expression defines the inputs with which the algorithm takes the average running time (or memory).

1.4.6. Asymptotic Complexity

Asymptotic analysis is concerned with how the running time of an algorithm increases with the size of the input in the limit, as the size of the input increases without bound. Usually a function expressing the relationship between t (running time) and n (data size) involves many terms. Since we are concerned with large values of n, we can ignore terms whose value is insignificant for large values of n. Such measure of complexity is called asymptotic complexity. e.g. f(n) = n^3 + 1000n f(n)n^3 There are five notations used to describe asymptotic complexity. These are:

Big-Oh Notation (O)Big-Omega Notation ()Theta Notation ()  Little-o Notation (o)  Little-Omega Notation ()

1.4.6.1. The Big-Oh Notation

Big-Oh notation is a way of comparing algorithms and is used for computing the complexity of algorithms; i.e., the amount of time that it takes for computer program to run. It’s only concerned with what happens for very a large value of n. Therefore only the largest term in the expression (function) is needed. For example, if the number of operations in an algorithm is n2 – n (from n^2 to n ) , n is insignificant compared to n^2 for large values of n. Hence the n term is ignored. Of course, for small values of n , it may be important. However, Big-Oh is mainly concerned with large values of n. Formal Definition: f (n) = O (g (n)) if there exist c, k ∊ ℛ +^ such that for all n≥ k, f (n) ≤ c.g (n). Examples: The following points are facts that you can use for Big-Oh problems:  1<=n for all n>=  n<=n^2 for all n>=  2 n^ <=n! for all n>=  log 2 n<=n for all n>=  n<=nlog 2 n for all n>=

1. f(n)=10n+5 and g(n)=n. Show that f(n) is O(g(n)). To show that f(n) is O(g(n)) we must show that constants c and k such that f(n) <=c.g(n) for all n>=k Or 10n+5<=c.n for all n>=k Try c=15. Then we need to show that 10n+5<=15n Solving for n we get: 5<5n or 1<=n. So f(n) =10n+5 <=15.g(n) for all n>=1. (c=15, k=1). 2. f(n) = 3n^2 +4n+1. Show that f(n)=O(n^2 ). 4n <=4n^2 for all n>=1 and 1<=n^2 for all n>= 3n^2 +4n+1<=3n^2 +4n^2 +n^2 for all n>= <=8n^2 for all n>= So we have shown that f(n)<=8n^2 for all n>= Therefore, f (n) is O(n^2 ) (c=8,k=1)

Typical Orders Here is a table of some typical cases. This uses logarithms to base 2, but these are simply proportional to logarithms in other base.

N O(1) O(log n) O(n) O(n log n) O(n^2 ) O(n^3 ) 1 1 1 1 1 1 1 2 1 1 2 2 4 8

1.4.7. Running time calculation examples

Analysis Rules:

  1. We assume an arbitrary time unit.
  2. Execution of one of the following operations takes time 1:  Assignment Operation  Single Input/Output Operation  Single Boolean Operations  Single Arithmetic Operations  Function Return
  3. Running time of a selection statement (if, switch) is the time for the condition evaluation + the maximum of the running times for the individual clauses in the selection.
  4. Loops: Running time for a loop is equal to the running time for the statements inside the loop * number of iterations. The total running time of a statement inside a group of nested loops is the running time of the statements multiplied by the product of the sizes of all the loops. For nested loops, analyze inside out.  Always assume that the loop executes the maximum number of iterations possible.
  5. Running time of a function call is 1 for setup + the time for any parameter calculations + the time required for the execution of the function body. Examples:
  6. int count() { int k=0; cout<< “Enter an integer”;

cin>>n; for (i=0;i<n;i++) k=k+1; return 0; } Time Units to Compute


1 for the assignment statement: int k= 1 for the output statement. 1 for the input statement. In the for loop: 1 assignment, n+1 tests, and n increments. n loops of 2 units for an assignment, and an addition. 1 for the return statement.


T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)

  1. int total(int n) { int sum=0; for (int i=1;i<=n;i++) sum=sum+1; return sum; } Time Units to Compute

1 for the assignment statement: int sum= In the for loop: 1 assignment, n+1 tests, and n increments. n loops of 2 units for an assignment, and an addition. 1 for the return statement.


T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)

  1. void func() { int x=0; int i=0; int j=1; cout<< “Enter an Integer value”; cin>>n; while (i<n){ x++; i++; } while (j<n) { j++; } }