Data Structures and Algorithms: A Comprehensive Guide for Beginners, Study notes of Data Structures and Algorithms

Class notes for CSCI 104: Data Structures and Object-Oriented Design. It covers topics such as strings and streams, memory allocation, recursion, linked lists, abstract data types, classes and objects, templates, error handling and exceptions, analysis of running time, and operator overloading and copy constructors. The notes provide an overview of the topics and include examples and explanations. useful for students studying computer science and related fields.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

shekhar_hin
shekhar_hin 🇺🇸

4.9

(9)

226 documents

1 / 244

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Class Notes for CSCI 104: Data Structures and Object-Oriented
Design
David Kempe and the awesome Fall 2013 sherpas
December 9, 2016
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Data Structures and Algorithms: A Comprehensive Guide for Beginners and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Class Notes for CSCI 104: Data Structures and Object-Oriented

Design

David Kempe and the awesome Fall 2013 sherpas

December 9, 2016

Contents

Chapter 1

Overview: Why Data Structures and

Object-Oriented Thinking

[Note: This chapter covers material of about 0.75 lectures.] As a result of our introductory CS classes (CSCI 103, or CSCI 101 for earlier generations), most students are probably somewhat proficient in basic programming, including the following basic features:

Data types int, String, float/double, struct, arrays,.. ..

Arithmetic

Loops for, while and do-while

Tests if, else and switch

Pointers

Functions and some recursion

Other I/O and other useful functions

OOP Some students have probably already learned a bit about object-oriented programming, but for now, we will not rely on this. In principle, these features are enough to solve any programming problem. In fact, in principle, it is enough to have nothing except while loops, if tests, integers, and arithmetic. Everything that we learn beyond those basics is “just” there to help us write better, faster, or more easily maintained or understood code. Writing “better” code comes in two flavors:

  1. Learning problem-solving and conceptual ideas that let us solve problems which we didn’t know how to solve (even though we had the tools in principle), or didn’t know how to solve fast enough.
  2. Learning new language features and programming concepts helps us write code that “feels good”, in the sense that it is easy to read, debug, and extend. The way most students are probably thinking about programming at this point is that there is some input data (maybe as a few data items, or an array), and the program executes commands to get an output from this. Thus, the sequence of instructions — the algorithm — is at the center of our thinking. As one becomes a better programmer, it’s often helpful to put the data themselves at the center of one’s thinking, and think about how data get transformed or changed throughout the execution of the program. Of course, in the end, the program will still mostly do the same, but this view often leads to much better (more maintainable, easier to understand, and sometimes also more efficient) code. Along the way, we’ll see how thinking about the organization of our data can be really important for determining how fast our code executes.

friend’s implementation works (fast enough) and meets the specification that you prescribed (say, in terms of which order the parameters are in), you can just use it as is. So the way we would think about a map or dictionary is mostly in terms of the functions it provides: add, remove, and lookup. Of course, it must store data internally in order to be able to provide these functions correctly, but how it does that is secondary — that’s part of the implementation. This combination of data and code, with a well-specified interface of functions to call, is called an abstract data type. Once we start realizing the important role that data organization (such as using abstract data types) plays in good code design, we may change the way we think about code interacting with data. When starting to program, as we mentioned above, most students think about code that gets passed some data (arrays, structs, etc.) and processes them. Instead, we now think of the data structure itself as having functions that help with processing the data in it. In this way of thinking, an array in which you can only add things at the end is different from one in which you are allowed to overwrite everywhere, even though the actual way of storing data is the same. This leads us naturally to object-oriented design of programs. An object consists of data and code; it is basically a struct with functions inside in addition to the data fields. This opens up a lot of ideas for developing more legible, maintainable, and “intuitive” code. Once we think about objects, they often become almost like little people in our mind. They serve different functions, and interact with each other in particular ways. One of the main things that thinking in terms of object-oriented design (and abstract data types) does for us is to help achieve encapsulation: shielding as much of the inside of an object as possible from the outside, so that it can be changed without negatively affecting the code around it. The textbook refers to this as “walls” around data. Encapsulation helps us achieve modularity, i.e., ensuring that different pieces of code can be analyzed and tested in isolation more easily.

To summarize, the learning goals of this class are:

  1. Learn the basic and advanced techniques for actually implementing data structures to provide efficient functionality. Some of these techniques will require strong fundamentals, and their analysis will enter into more mathematical territory, which is why we will be drawing on material you will be learning simultaneously in CSCI 170.
  2. Learn how to think about code more abstractly, separating the “what” from the “how” in your code design and utilizing abstract data types to specify functionality.
  3. Learn about good programming practice with object-oriented design, in particular as it relates to implementing datatypes.

n x[0] x[5] m

Figure 2.1: The layout of memory on the stack for the declaration above.

was. When the function finishes, its memory block is made available for other purposes again. Again, for statically allocated variables, the compiler can take care of all of this.

2.1 Scoping of Variables

While we are talking about local variables and the stack, we should briefly talk about the scope of a variable. Global variables are, of course, accessible to any function or any block of code in your program. Local variables in function or code blocks only exist while the execution is in the function or the code block. It is typically stored on the stack. As soon as the function or block terminates, the variable is deallocated. When the execution of a function is temporarily suspended, for instance because another function got called inside this one, the variables are stored on the stack, but not active, so they cannot be accessed. Consider the following example:

void foo (int x) { int y; // do some stuff }

void bar (int n) { int m; foo (n+m); // do more stuff }

Here, when bar calls foo, the function foo cannot access the variables n or m. As soon as foo finishes, the variables x and y are deallocated, and permanently lost. bar now resumes and has access to n and m, but not to x or y. You will sometimes have multiple variables (in different parts of your code) sharing the same name. For instance, you may have both a global variable n and a local variable in a function called n. Or you could have something like the following:

void foo (int n) { int m = 10; // do something for (int i = 0; i < m; i ++) { int n = 3, m = 5; // do something cout << n << m; } }

Here, the cout << n << m statement would output the innermost versions, so the values 3 and 5. As a general rule, when there are multiple variables with the same name, any reference is to the one in the smallest code block enclosing the statement that contains that variable.

2.2 Dynamic allocation

Unfortunately, things aren’t quite as easy as statically allocated arrays when we don’t know at compile time how much memory a variable will need. Suppose we want to do something like the following:

int n; cin>>n; // create an array a of n integers

Here, at compile time, the compiler does not know how much memory the array will need. It can therefore not allocate room for a variable on the stack^3. Instead, our program needs to explicitly ask the operating system for the right amount of space at run-time. This memory is assigned from the heap^4 space. The difference between static and dynamic memory allocation is summarized in the following table. To fully understand how dynamic memory allocation works, we need to spend some time on pointers.

Static allocation Dynamic allocation Size must be known at compile time Size may be unknown at compile time Performed at compile time Performed at run time Assigned to the stack Assigned to the heap First in last out No particular order of assignment

Table 2.1: Differences between statically and dynamically allocated memory.

2.3 Pointers

A pointer is an “integer” that points to a location in memory — specifically, it is an address of a byte^5. In C/C++, pointer types are declared by placing a star ‘*’ behind a regular type name. Thus, int *p; char *q; int **b; void *v; all declare pointers. In principle, all these are just addresses of some memory location, and C/C++ does not care what we store there. Declaring them with a type (such as int) is mostly for the programmer’s benefit: it may prevent us from messing up the use of the data stored in the location. It also affects the way some arithmetic on memory locations is done, which we explain below. Two of the ways in which “regular” variables and pointers often interact are the following:

  1. We want to find out where in memory a variable resides, i.e., get the pointer to that variable’s location.
  2. We want to treat the location a pointer points to as a variable, i.e., access the data at that location, by reading it or overwriting it.

The following piece of code illustrates some of these, as well as pitfalls we might run into.

(^3) Technically, this is not quite true. Some modern compilers let you define arrays even of dynamic sizes, but we advise against using this functionality, and instead do things as we write in these notes. (^4) This is called the heap space since it can be selected from any portion of the space that has not been allocated already. While the stack remains nicely organized, memory in the heap tends to be more messy and all over the place. Hence the name. (^5) This means that all pointers are of the same size, and that the size of pointer used by a computer places a limit on the size of memory it can address. For example, a computer using typical 32-bit pointers can only use up to 2^32 bytes or 4 gigabytes of memory. The modern shift to 64-bit architectures turns this to 2^64 bytes, which will be enough for a while.

2.4.1 C Style

The function void* malloc (unsigned int size) requests size bytes of memory from the operating sys- tem, and returns the pointer to that location as a result. If for some reason, the OS failed to allocate the memory (e.g., there was not enough memory available), NULL is returned instead. The function void free (void* pointer) releases the memory located at pointer for reusing. A solution to our earlier problem of a dynamically sized array could look as follows:

int n; int* b; cin >> n; b = (int) malloc(nsizeof(int)); for (int i=0; i<n; i++) cin >> b[i];

In order to request space for n integers, we need to figure out how many bytes that is. That’s why we multiply with sizeof(int). Using sizeof(int) is much better than hard-coding the constant 4, which may not be right on some hardware now or in the future. Because malloc returns a void* (it does not know what we want to use the memory for), and we want to use it as an array of integers, we need to cast it to an int. For good coding practice, we should probably also check whether b==NULL before dereferencing it, but this example is supposed to remain short. Another thing to observe here is that we can reference b just like an array, and we write b[i]. The compiler treats this exactly as (b+i), and, as you probably remember from the part about pointer arith- metic, this points to the ith^ entry of the array. In fact, that’s exactly how C/C++ internally treats all arrays anyway; basically, they are just pointers. If we wanted to write b[i] in a complicated way by doing all the pointer arithmetic by hand, we could write instead ((int) ((void) b + isizeof(int))). Obviously, this is not what we like to type (or have to understand), but if you understand everything that happens here, you are probably set with your knowledge of pointer arithmetic and casting.

To return the memory to the OS after we’re done using it, we use the function free, as follows:

free(b); b = NULL;

Note that free does nothing to the pointer b itself; it only deallocates the memory that b pointed to, telling the operating system that it is available for reuse. Thus, it is recommended that you immediately set the pointer to NULL so that your code does not attempt to tamper with invalid memory. If you reference b somewhere, you’ll just get a runtime error. If you don’t set b=NULL, the OS may give the memory to another variable, and you accidentally reference/overwrite that one. That kind of mistake can be much harder to detect, and is easily avoided by setting the pointer to NULL immediately after deallocating it.

2.4.2 C++ Style

C++ provides the new() and delete() functions that provide some syntactic sugar to C’s malloc() and free(). Basically, they relieve you from the calculations of the number of bytes needed, the casting of pointers, and provide a more “array-like” syntax. Our example now looks as follows:

int n; int *b; cin >> n; b = new int[n];

Notice that there are no parentheses, but instead, we have brackets for the number of items. new figures out by itself how much memory is needed, and returns the correct type of pointer. If we wanted space for just one integer, we could write int *p = new int; While this is not really very useful for a single integer, it will become very central to allocating objects later, where we often allocate one at a time dynamically. To release memory, the equivalent of free is the delete operator, used as follows:

delete [] b; delete p;

The first example deallocates an array, while the second deallocates a single instance of a variable (a single int in our example). This deallocates the memory pointed to by the pointers. As with free, it still leaves the pointers themselves pointing to the same memory location, so it is good style to write b = NULL or p = NULL after the delete commands.

2.5 Memory Leaks

Let us look a little more at the things that can go wrong with dynamic memory allocation.

double x; ... x = (double) malloc(100sizeof(double)); ... x = (double) malloc(200*sizeof(double)); // We need a bigger array now! ... free(x);

This code will compile just fine, and most likely will not crash, at least not right away. We correctly allocate an array of 100 double, use it for some computation, and then allocate an array of 200 double when we realize that we need more memory. But notice what happens here. The moment we do the second allocation, x gets overwritten with a pointer to the newly allocated memory block. At that point, we have no more recollection of the pointer to the previous memory block. That means we cannot read it, write it, or free it. When at the end of the code snippet, the program calls free(x), it successfully frees up the second allocated block, but we are unable to tell the operating system that we don’t need the first block any more. Thus, those 800 bytes will never become available again (until our program terminates — but for all we know, it may run for several years as a backend server somewhere). This kind of situation is called a memory leak : available memory is slowly leaking out of the system. If it goes on long enough, our program may run out of memory and crash for that reason. It could be quite hard to diagnose why the crash happened if it does after a long time of running. It is good practice to keep close track of the memory blocks you reserve, and make sure to free (or delete) memory pointed to by a pointer before reassigning it^7. A better version of the code above would be the following:

double x; ... x = (double) malloc(100*sizeof(double)); ... free(x); x = NULL;

(^7) There are tools for checking your code for memory leaks, and we recommend familiarizing yourself with them. The most well-known one, for which the course web page contains some links, is called valgrind.