The Study of Data Structures, Study notes of Data Structures and Algorithms

The importance of studying data structures in computer science. It explains how data structures are essential for managing collections of information and how learning to program with abstractions is a sign of programming maturity. The document also highlights the analytical techniques involved in the analysis of data structures and the problems involved in the management of large applications. The book is divided into three sections, and the third section consists of worksheets that are tied to the material presented in earlier chapters. The author suggests that group work is the best way to use the worksheets.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

carlick
carlick 🇺🇸

4.2

(11)

276 documents

1 / 271

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1: The Study of Data Structures 1
Chapter 1: The Study of Data Structures
The study of data structures has long been considered the cornerstone and starting point
for the systematic examination of computer science as a discipline. There are many
reasons for this. There is the practical realization that almost every program of more than
trivial complexity will need to manage collections of information, and so will require the
use of one or more data structures. Learning the tools and techniques that have proven
over a long period of time to be useful for software development will help the student
become a more productive programmer. But that is only the first, and most practical, of
many reasons.
Data structures are one of the easiest ideas to visualize abstractly, as most data structures
can be matched with a metaphor that the reader will already be familiar with from
everyday life. A stack, for example, is a collection that is managed in much the same
fashion as a stack of dishes; namely, only the topmost item is readily available, and the
top must be removed before the item underneath can be accessed. The ability to deal with
abstract ideas, and the associated concept of information hiding, are the primary tools that
computer scientists (or, for that matter, all scientists) use to manage and manipulate
complex systems. Learning to program with abstractions, rather than entirely with
concrete representations, is a sign of programming maturity.
The analysis of data structures involves a variety of mathematical and other analytical
techniques. These help reinforce the idea that computer science is a science, and that
there is much more to the field than simple programming.
It is in the examination of data structures that the student will likely first encounter the
problems involved in the management of large applications. Modern software
development typically involves teams or programmers, often dozens or more, working on
separate components of a larger system. The difficulties in such development are
typically not algorithmic in nature, but deal more with management of information. For
example, if the software developed by programmer A must interact with the software
developed by programmer B, what is the minimal amount of information that must be
communicated between A and B in order to ensure that their components will correctly
interact with each other? This is not a problem that the student will likely have seen in
their beginning programming courses. Once more, abstraction and information hiding are
keys to success. Each of these topics will be explored in more detail in the chapters that
follow.
The book is divided into three sections. In the first section you will learn about tools and
techniques used in the analysis of data structures. In the second part you will learn about
the abstractions that are considered the classic core of the study of data structures. The
third section consists of worksheets.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download The Study of Data Structures and more Study notes Data Structures and Algorithms in PDF only on Docsity!

Chapter 1: The Study of Data Structures 1

Chapter 1: The Study of Data Structures

The study of data structures has long been considered the cornerstone and starting point for the systematic examination of computer science as a discipline. There are many reasons for this. There is the practical realization that almost every program of more than trivial complexity will need to manage collections of information, and so will require the use of one or more data structures. Learning the tools and techniques that have proven over a long period of time to be useful for software development will help the student become a more productive programmer. But that is only the first, and most practical, of many reasons.

Data structures are one of the easiest ideas to visualize abstractly , as most data structures can be matched with a metaphor that the reader will already be familiar with from everyday life. A stack , for example, is a collection that is managed in much the same fashion as a stack of dishes; namely, only the topmost item is readily available, and the top must be removed before the item underneath can be accessed. The ability to deal with abstract ideas, and the associated concept of information hiding , are the primary tools that computer scientists (or, for that matter, all scientists) use to manage and manipulate complex systems. Learning to program with abstractions, rather than entirely with concrete representations, is a sign of programming maturity.

The analysis of data structures involves a variety of mathematical and other analytical techniques. These help reinforce the idea that computer science is a science , and that there is much more to the field than simple programming.

It is in the examination of data structures that the student will likely first encounter the problems involved in the management of large applications. Modern software development typically involves teams or programmers, often dozens or more, working on separate components of a larger system. The difficulties in such development are typically not algorithmic in nature, but deal more with management of information. For example, if the software developed by programmer A must interact with the software developed by programmer B, what is the minimal amount of information that must be communicated between A and B in order to ensure that their components will correctly interact with each other? This is not a problem that the student will likely have seen in their beginning programming courses. Once more, abstraction and information hiding are keys to success. Each of these topics will be explored in more detail in the chapters that follow.

The book is divided into three sections. In the first section you will learn about tools and techniques used in the analysis of data structures. In the second part you will learn about the abstractions that are considered the classic core of the study of data structures. The third section consists of worksheets.

Chapter 1: The Study of Data Structures 2

Active Learning

It is in the third section that this book distinguishes itself most clearly from other examinations of the same topic. The third section is a series of worksheets, each of which is tied to the material presented in earlier chapters. Simply stated, this book asks you to do more, and to read (or, if you are in a traditional classroom setting, listen to ) less. By becoming a more active participant in the educational process, the knowledge you gain will become more familiar to you, and you will hopefully be more comfortable with your abilities. This approach has been used in a variety of different forms for many years, with great success.

While the worksheets can be used in an individual situation, it is the author’s opinion (supported by experience), that they work best in a group setting. When students work in a group, they can help each other learn the basic material. Additionally, group work helps students develop their communication skills, as well as their programming skills. In the authors use, a typical class consists of a short fifteen to twenty minute lecture, followed by one or two worksheets completed in class.

At the end of each chapter are study questions that you can use to measure your understanding of the material. These questions are not intended to be difficult, and if you have successfully mastered the topic of the chapter you should be able to immediately write short answers for each of these questions. These study questions are followed by more traditional short answer questions, analysis questions that require more complex understanding, and programming projects. Finally, in recent years there has been an explosion of information available on the web, much of it actually accurate, truthful and useful. Each chapter will end with references to where the student can learn more about the topic at hand.

A Note on Languages

Most textbooks are closely tied to a specific programming language. In recent years it has also become commonplace to emphasize a particular programming paradigm, such as the use of object-oriented programming. The author himself is guilty of writing more than a few books of this sort. However, part of the beauty of the study of data structures is that the knowledge transcends any particular programming language, or indeed most programming paradigms. While the student will of necessity need to write in a particular programming language if they are to produce a working application, the presentation given in this book is purposely designed to emphasize the more abstract details, allowing the student to more easily appreciate the essential features, and not the unimportant aspects imposed by any one language. An appendix at the end of the book is devoted to describing a few language-specific features.

Chapter 2: Algorithms 2

be followed by the reader. Terms that have a related meaning include process, routine, technique, procedure, pattern, and recipe.

It is important that you understand the distinction between an algorithm and a function. A function is a set of instructions written in a particular programming language. A function will often embody an algorithm, but the essence of the algorithm transcends the function, transcends even the programming language. Newton described, for example, the way to find square roots, and the details of the algorithm will remain the same no matter what programming language is used to write the code.

Properties of Algorithms

Once you have an algorithm, to solve a problem is simply a matter of executing each instruction of the algorithm in turn. If this process is to be successful there are several characteristics an algorithm must possess. Among these are:

input preconditions. The most common form of algorithm is a transformation that takes a set of input values and performs some manipulations to yield a set of output values. (For example, taking a positive number as input, and returning a square root). An algorithm must make clear the number and type of input values, and the essential initial conditions those input values must possess to achieve successful operation. For example, Newtons method works only if the input number is larger than zero.

precise specification of each instruction. Each step of an algorithm must be well defined. There should be no ambiguity about the actions to be carried out at any point. Algorithms presented in an informal descriptive form are sometimes ill-defined for this reason, due to the ambiguities in English and other natural languages.

correctness. An algorithm is expected to solve a problem. For any putative algorithm, we must demonstrate that, in fact, the algorithm will solve the problem. Often this will take the form of an argument, mathematical or logical in nature, to the effect that if the input conditions are satisfied and the steps of the algorithm executed then the desired outcome will be produced.

termination, time to execute. It must be clear that for any input values the algorithm is guaranteed to terminate after a finite number of steps. We postpone until later a more precise definition of the informal term “steps.'” It is usually not necessary to know the exact number of steps an algorithm will require, but it will be convenient to provide an upper bound and argue that the algorithm will always terminate in fewer steps than the upper bound. Usually this upper bound will be given as a function of some values in the input. For example, if the input consists of two integer values n and m, we might be able to say that a particular algorithm will always terminate in fewer than n + m steps.

description of the result or effect. Finally, it must be clear exactly what the algorithm is intended to accomplish. Most often this can be expressed as the production of a result value having certain properties. Less frequently algorithms are executed for a side effect ,

Chapter 2: Algorithms 3

such as printing a value on an output device. In either case, the expected outcome must be completely specified.

Human beings are generally much more forgiving than computers about details such as instruction precision, inputs, or results. Consider the request “Go to the store and buy something to make lunch”. How many ways could this statement be interpreted? Is the request to buy ingredients (for example, bread and peanut butter), or for some sort of mechanical “lunch making” device (perhaps a personal robot). Has the input or expected result been clearly identified? Would you be surprised if the person you told this to tried to go to a store and purchase an automated lunch-making robot? When dealing with a computer, every step of the process necessary to achieve a goal must be outlined in detail.

A form of algorithm that most people have seen is a recipe. In worksheet 1 you will examine one such algorithm, and critique it using the categories given above. Creating algorithms that provide the right level of detail, not too abstract, but also not too detailed, can be difficult. In worksheet 2 you are asked to describe an activity that you perform every day, for example getting ready to go to school. You are then asked to exchange your description with another student, who will critique your algorithm using these objectives.

Specification of Input

An algorithm will, in general, produce a result only when it is used in a proper fashion. Programs use various different techniques to ensure that the input is acceptable for the algorithm. The simplest of these is the idea of types , and a function type signature. The function shown at right, for instance, returns the smaller of two integer values. The compiler can check that when you call the function the arguments are integer, and that the result is an integer value.

Some requirements cannot be captured by type signatures. A common example is a range restriction. For instance, the square root program we discussed earlier only works for positive numbers. Typically programs will check the range of their input at run-time, and issue an error or exception if the input is not correct.

Description of Result

Just as the input conditions for computer programs are specified in a number of ways, the expected results of execution are documented in a variety of fashions. The most obvious is the result type, defined as part of a function signature. But this only specifies the type of the result, and not the relationship to the inputs. So equally important is some sort of documentation, frequently written as a comment. In the min function given earlier it is noted that the result is not only an integer, but it also represents the smaller of the two input values.

int min (int a, int b) { /* return smaller argument */ if (a < b) return a; else return b; }

Chapter 2: Algorithms 5

becomes smaller on each iteration, and so the sum n+m satisfies our three conditions, and hence can be used to demonstrate that the function will halt.

Recursive Algorithms

Recursion is a very common technique used for both algorithms and the functions that result from implementing algorithms. The basic idea of recursion is to “reduce” a complex problem by “simplifying” it in some way, then invoking the algorithm on the simpler problem. This is performed repeated until one or more special cases, termed base

Zeno’s Paradox

The most nonintuitive of the three properties required for proving termination is the first: that the quantity or property being discussed can be placed into a correspondence with a diminishing list of integers. Why integer? Why not just numeric? Why diminishing? To illustrate the necessity of this requirement, consider the following “proof” that it is possible to share a single candy bar with all your friends, no matter how many friends you may have. Take a candy bar and cut it in half, giving one half to your best friend. Take the remaining half and cut it in half once more, giving a portion to your second best friend. Assuming you have a sufficiently sharp knife, you can continue in this manner, at each step producing ever-smaller pieces, for as long as you like. Thus, we have no guarantee of termination for this process. Certainly the sequence here (1/2, ¼, 1/8, 1/16 …) consists of all nonnegative terms, and is decreasing. But a correspondence with the integers would need to increase, not decrease; and any diminishing sequence would be numeric, but not integer.

In the 5th century B.C. the Greek philosopher Zeno used similar arguments to show that no matter how fast he runs, the famous hero Achilles cannot overtake a Tortoise, if the Tortoise is given a head start. Suppose, for example, that the Tortoise and Achilles start as below at time A. When, at time B, Achilles reaches the point from which the Tortoise started, the Tortoise will have proceeded some distance ahead of this point. When Achilles reaches this further point at time C, the Tortoise will have proceeded yet further ahead. At each point when Achilles reaches a point from which the Tortoise began, the Tortoise will have proceeded at least some distance further. Thus, it was impossible for Achilles to ever overtake and pass the Tortoise.

Achiles A - > B - > C - > D Tortoise A - > B - > C - > D

The paradox arises due to an infinite sequence of ever-smaller quantities, and the nonintuitive possibility that an infinite sum can nevertheless be bounded by a finite value. By restricting our arguments concerning termination to use only a decreasing set of integer values, we avoid problems such as those typified by Zeno's paradox.

Chapter 2: Algorithms 6

cases , are encountered. Arguments that do not correspond to base cases are called recursive cases , or sometimes inductive cases.

Recursion is a common theme in Art or commercial graphics. A seemingly infinite series of objects can be generated by looking at two opposing mirrors, for example. The image on box of a popular brand of cocoa shows a woman holding a tray of cocoa, including a box of cocoa, on which is found a picture of a woman holding a tray of cocoa, and so on.

You have undoubtedly seen mathematical definitions that are defined in a recursive fashion. The exponential function, for instance, is traditionally defined as follows:

N! = N * (N-1)! for all values N > 0 0! = 1

Here zero is being used as the base case, and the simplification consists of subtracting one, then invoking the factorial on the smaller number. The definition suggests an obvious algorithm for computing the factorial.

How to compute the factorial of N. If N is less than zero, issue an error. Otherwise, if N is equal to zero, return 1. Otherwise, compute the factorial of (N-1), then multiply this result by N and return the result.

The box at right shows this algorithm expressed as a function. To prove termination we would use the same technique as before; that is, identify a value that is integer, decreasing, and non-negative. In this case the value n itself suffices for this argument.

Many algorithms are most easily expressed in a recursive form. The base case need not be a single value, it can sometimes be a condition that the argument must satisfy. A simple example that illustrates this is printing a decimal number, such as 4973. It is relatively easy to handle the single digits, zero to 9, as special cases. Here there are ten base cases. For larger numbers, recursively print the value you get by dividing the number by ten, then print the single digit you get when you take the remainder divided by ten. In our example printing 4973 would recursively call itself to print 497, which would in turn recursively call itself to print 49, which would yet again recursively call itself to print 4. The final call results in one of our base cases, so 4 would be printed. Once this function returns the value 9 is printed, which is the remainder left after dividing 49 by ten. Once this call returns the value 7 would be printed, which is the remainder left after dividing 497 by ten. And finally the value 3 would be printed, which is the remainder left after diving 4973 by ten.

The function resulting from this algorithm is shown at left. Here the base cases have been hidden in another function,

int factorial (int N) { assert (N >= 0); if (N == 0) return 1; return N * factorial (N – 1); }

void printInteger (int n) { assert (n > 0); if (n > 9) printInteger (n / 10); printDigit (n % 10); }

Chapter 2: Algorithms 8

A B C

Notice that in this picture we have moved all but the last disk to pole C. Once we move the largest disk, pole A will be empty, and we can use it as a temporary. This observation hints at how to express the solution to the problem as a recursive algorithm.

How to move N disks from pole A to pole B using pole C as a temporary: If N is 1, move disk from A to B. Otherwise, move (N – 1) disks from pole A to pole C using pole B as a temporary. Move the largest disk from pole A to pole B. Then move disks from pole C to pole B, using pole A as a temporary.

We could express this algorithm as pseudo-code as shown. Here the arguments are used to represent the size of the stack and the names of the poles, and the instructions for solving the puzzle are printed.

We will see many recursive algorithms as we proceed through the text.

Study Questions

  1. What is abstraction? Explain how a function is a form of abstraction.
  2. What is an algorithm?
  3. What is the relationship between an algorithm and a function?
  4. Give an example input condition that cannot be specified using only a type.
  5. What are two different techniques used to specify the input conditions for an algorithm?
  6. What are some ways used to describe the outcome, or result, of executing an algorithm?
  7. In what way does the precision of instructions needed to convey an algorithm to another human being differ from that needed to convey an algorithm to a computer?

void solveHanoi (int n, char a, char b, char c) { if (n == 1) print (“move disk from pole ”, a, “ to pole ”, b); else { solveHanoi (n – 1, a, c, b); print (“move disk from pole “, a, “ to pole “, b); solveHanoi (n – 1, c, b, a); } }

Chapter 2: Algorithms 9

  1. In considering the execution time of algorithms, what are the two general types of questions one can ask?
  2. What are some situations in which termination of an algorithm would not be immediately obvious?
  3. What are the three properties a value must possess in order to be used to prove termination of an algorithm?
  4. What is a recursive algorithm? What are the two sections of a recursive algorithm?
  5. What is the activation record stack? What values are stored on the activation record stack? How does this stack simplify the execution of recursive functions?

Analysis Exercises

  1. Examine a recipe from your favorite cookbook. Evaluate the recipe with respect to each of the characteristics described at the beginning of this chapter.
  2. This chapter described how termination of an algorithm can be demonstrated by finding a value or property that satisfies three specific conditions. Show that all three conditions are necessary. Do this by describing situations that satisfy two of the three, but not the third, and that do not terminate. For example, the text explained how you can share a single candy bar with an infinite number of friends, by repeatedly dividing the candy bar in half. What property does this algorithm violate?
  3. The version of the towers of Hanoi used a stack of size 1 as the base case. Another possibility is to use a stack of size zero. What actions need to be performed to move a stack of size zero from one pole to the next? Rewrite the algorithm to use this formulation.
  4. What integer value suffices to demonstrate that the towers of Hanoi algorithm must eventually terminate for any size tower?

Exercises

  1. Assuming you have the ability to concatenate a character and a string, describe in pseudo-code a recursive algorithm to reverse a string. For example, the input “function” should produce the output “noitcnuf”.
  2. Express the algorithm to compute the value a raised to the nth^ power as a recursive algorithm. What are your input conditions? What is your base case?
  3. Binomial coefficients (the number of ways that n elements can be selected out of a collection of m values) can be defined recursively by the following formula:

Chapter 2: Algorithms 11

  1. Write a program that takes an integer argument and prints the value spelled out as English words. For example, the input - 3472 would produce the output “negative three thousand four hundred seventy-two”. After removing any initial negative signs, this program is most easily handled in a recursive fashion. Numbers greater than millions can be printed by printing the number of millions (via a recursive call), then printing the work “million”, then printing the remainder. Similarly with thousands, hundreds, and most values larger than twenty. Base cases are zero, numbers between 1 and 20.
  2. Write a program to compute the fibonocci sequence recursively (see earlier exercise). Include a global variable fibCount that is incremented every time the function calls itself recursively. Using this, determine the number of calls for various values of N, such as n from 1 to 20. Can you determine a relationship between n and the resulting number of calls?
  3. Do the same sort of analysis as described in the preceding question, but this time for the towers of Hanoi.

On the Web

Wikipedia (http://en.wikipedia.org) has a detailed entry for Algorithm that provides a history of the word, and several examples illustrating the way algorithms can be presented. The GCD algorithm is described in an entry “Euclidian Algorithm”. Another interesting entry can be found on “recursive algorithms”. The entry on “Fibonacci Numbers” provides an interesting history of the sequence, as well as many of the mathematical relationships of these numbers. The entry of “Towers of Hanoi” includes an animation that demonstrates the solution for N equal to 4. The more mathematically inclined might want to explore the entries on “Fractals” and “Mathematical Induction”, and the relationship of these to recursion. Newtons method of computing square roots is described in an entry on “Methods of computing square roots”. Another example of recursive art in commercial advertising is the “Morton Salt girl”, who is spilling a container of salt, on which is displayed a picture of the Morton salt girl.

Chapter 3: Debugging, Testing and Proofs of Correctness 1

Chapter 3: Debugging, Testing and Proving

Correctness

In this chapter we investigate tools that will help you to produce reliable and correct programs. During development of any program you will undoubtedly need to remove errors, and this will involve debugging. Once you believe your program (or portions of it) is correct you will want to increase your confidence in the program by systematic testing. Typically testing will uncover errors, which will lead to further debugging. Finally, the most powerful tool you can use to increase your confidence in a program or function is a proof of correctness. All of these tools are useful, and none should be considered to be a substitute for the others.

Hints on Debugging

There is no question that programming is a difficult task. Few nontrivial programs can be expected to run correctly the first time without error. Fortunately, there are many hints that can be used to help make debugging easier. Here are some of the more useful suggestions:

  • Test small sections of a program in isolation. When you can identify a section of a program that is doing a specific task (this could be a loop or a function), write a small bit of code that tests this functionality. Gain confidence in the small pieces before considering the larger whole.
  • When you see an error produced for a given input, try to find the simplest input that consistently reproduces the same error. Errors that cannot be reproduced are very difficult to eliminate, and simple inputs are much easier to reason about than more complex inputs.
  • Once you have a simple test input that you know is handled incorrectly, play the role of the computer in your mind, and simulate execution of this test input. This will frequently lead you the location of your logical error.
  • Think about what occurs before the point the error is noticed. An incorrect result is simply the symptom, and you must look earlier to find the cause.
  • Use breakpoints or print statements to view the state of the computation in the middle. Starting with an input that produces the wrong result, try to reason backwards and determine what the values of variables would need to be to produce the output you see. Then check the state using break points or print statements. This can help isolate the portion of the program that contains the error.
  • Don’t assume that just because one input is handled correctly that your program is correct.

Chapter 3: Debugging, Testing and Proofs of Correctness 3

Assertions become most useful when they are combined with loops. An assertion inside a loop is often termed an invariant , since it must describe a condition that does not vary during the course of executing the loop. To discover an invariant simply ask yourself why you think a program loop is doing the right thing, and then try to express “right thing” as a statement. For example, the function at right is computing the sum of an array of values. It does this by computing a partial sum, a sum up to a given point. So assertion number 3 is the easiest to discover. Once you discover assertion 3, then assertion 2 becomes clear – it is whatever is needed to ensure that assertion 3 will be true after the assignment. Assertion 4 is stating the expected result, while assertion 1 is asserting what is true before the loop begins.

Later in this chapter you will learn how to use invariants and assertions to prove that an algorithm or program is correct.

Notice how assertions require you to understand a high level description of what the algorithm is trying to do, and not simply a low level understanding of what the individual statements are doing. For example, consider the bubble sort algorithm shown at left. Bubble sort has two nested loops. The outer loop is the position of the array being filled. The inner loop is “bubbling” the largest element into this position. So once again at the end of the inner loop you want to make an assertion not only about the particular values at index locations j and j+1, but about everything the loop has seen before (namely, the elements with index values less than j). Once you identify this assertion, then the assertion at the beginning of the loop must be whatever is needed to prove the assertion at the end of the loop, and together these must be whatever is necessary to prove the assertion at the end of the outer loop.

In Worksheet 4 you will practice writing invariants and assertions that could be used to prove the correctness of a variety of programs.

double sum (double data[ ], int n) { double s = 0.0; /* 1. s is the sum of an empty array / for (int i = 0; i < n; i++) { / 2. s is the sum of values from 0 to i-1 / s = s + data[i]; / 3. s is the sum of values from 0 to i / } / 4. s is the sum from 0 to n-1 */ return s; }

void bubbleSort (double data [ ], int n) { for (int i = n-1; i > 0; i--) { for (int j = 0; j < i; j++) { // data[j] is largest value in 0 .. j if (data[j] > data[j+1]) swap(data, j, j+1) // data[j+1] is largest value in 0 .. j+ } data[i] is largest value in 0 .. i } // array is sorted }

Bubble Sort and Sorting

Bubble sort is the first of many sorting algorithms we will encounter in this book. Bubble sort is examined not because it is useful, it is not, but because it is very simple to analyze and understand. But there are many other sorting algorithms that are far more efficient. You should never use bubble sort for any real application, since there are many better alternatives.

Chapter 3: Debugging, Testing and Proofs of Correctness 4

Assertions and the assertion statement

Most programming languages include a statement, often termed the assertion statement , that performs a task that is similar but not exactly the same as the concept of the assertion described above. The assertions described here are written as comments, and are not executed by the program during execution. They need not be written in an executable form. The assertion statement, on the other hand, takes as argument an expression, and typically will halt execution and print an error message if the statement is not true. This can be very useful for verifying that input values satisfy whatever conditions are necessary for execution. Our square root example from the previous chapter, for example, could use an assertion statement to check that the input is a positive number:

double sqrt (double val) { assert (val >= 0); /* halt execution if value is not legal */

Because assertion statements are executed at run time, and halt execution if they are not satisfied, they should be used sparingly, but can be useful during debugging. The wikipedia entry for “assertions” contains a good discussion of assertions as used for program proofs compared with assertions used for routine error checking.

Introduction to the Binary Search Algorithm

An important algorithm that we will see many times in many different forms is the binary search algorithm. Binary search is similar to the way you guess a number if somebody says “I’m thinking of a value between 0 and 100. Can you find my number?” If you have played this game, you know the optimal strategy is to guess the value in the middle: “Is it larger or smaller than 50?” Suppose the other person answers “smaller”. Then you again divide the range: “is it larger or smaller than 25?” By repeatedly apply this technique you very quickly find the hidden value. The binary search algorithm works in a similar fashion, but instead of numbers it looks for a specific value in an array of sorted numbers. Like the guessing game, it starts in the middle of the array, in one question eliminating one half of the possibilities, then in the next question breaking that subsection in half, and so on.

One version of the binary search algorithm is shown at right. Here n represents the number of values in the sorted data array, and the variable test is the value being searched for. You can verify that the algorithm is correct using the following invariant:

Binary Search Invariant : All values with index positions smaller than low are less than test, and all values with index

int binarySearch(double data[ ], int n, double test) { /* data is size n sorted array / int low = 0; int high = n; while (low < high) { mid = (low + high) / 2; if (data[mid] == test) return 1; / true / if (data[mid] < testValue) low = mid + 1; else high = mid; } return 0; / false */ }

Chapter 3: Debugging, Testing and Proofs of Correctness 6

printf(“test case 5: %g \n”, t5); return 0; }

Notice that we expected one of these data sets to halt execution with an assertion error. After verifying that this is correct, you can comment out that particular test case while the others are processed.

In the test harness shown above we simply print the result, and count on the programmer running the test harness to verify the result. Sometimes it is possible to check the result directly. For example, if you were testing a method to compute a square root you could simply multiply the result by itself and verify that it produced the original number.

Question : Think about testing a sorting algorithm. Can you write a function that would test the result, rather than simply printing it out for the user to validate?

Once you are convinced that individual functions are working correctly, the next step is to combine calls on functions into more complex programs. Again you should perform testing to increase your confidence in the result. This is termed integration testing. Often you will uncover errors during integration testing. Once you fix these you should go back and re-execute the earlier test harness to ensure that the changes have not inadvertently introduced any new errors. This process is termed regression testing.

Some testing considers only the structure of the input and output values, and ignores the algorithm used to produce the result. This is termed black box testing. Other times you want to consider the structure of the function, for example to ensure that every if statement is exercised both with a value that makes it true and a value that makes it false. This is termed white box testing. Goals for white box testing should include that every statement is executed, and that every condition is evaluated both true and false. Other more complex test conditions can test the boundaries of a computation.

Testing alone should never be used to guarantee a program is working correctly. The famous computer scientist Edsger Dijkstra pointed out that testing can show the presence of errors but never their absence. Testing should be used in combination with logical thought, assertions, invariants, and proofs of correctness. All have a part to play in the development of a reliable program.

In worksheet 5 you will think about test cases for a variety of simple programs.

More on Program Proofs

We noted earlier in this chapter that the most powerful way to gain confidence in the correctness of a function or program is to develop a proof that the function is correct. In this section we will investigate this process in more detail, by examining another classic algorithm, a sorting algorithm named selection sort. Selection sort is easy to describe, which is why we study it. But like bubble sort it is also slow, so is not generally used in practice. In later lessons we will examine faster algorithms.

Chapter 3: Debugging, Testing and Proofs of Correctness 7

double storage [ ]; /* size is n */ ... int indexLargest = 0; for (int i = 1; i <= n-1; i++) { if (storage[i] > storage[indexLargest]) indexLargest = i; }

How to sort an array using selection sort: Find the index of the largest element in an array. Swap the largest value into the final location in the array. Then do the same with the next largest element. Then the next largest, and so on until you reach the smallest element. At that point the array will be sorted.

To develop this algorithm as executable code the first step is to isolate the smallest portion of the problem description that could be independently programmed, tested, and debugged. In this case you might select that first sentence: “find the index of the largest element in an array”. How do you do that? The best way seems to be a loop.

How do we know this small bit of code is correct? As we discussed in the previous lessons, there are two general techniques that are used, and you should always use both of them. These two techniques are proofs and testing.

A proof of correctness is an informal argument that explains why you believe the code is correct. As you learned earlier, such proofs are built around assertions , which are statements that describe the relationships between variables when the computer reaches a point in execution. Using assertions, you simulate the execution of the algorithm in your mind, and argue both that the assertions are valid, and that they lead to the correct outcome.

In the code fragment above, we know that in the middle of execution the variable i represents some indefinite memory location. We have examined all values up to i, and indexLargest represents the largest value in that range. The values beyond index i have not yet been examined, and are therefore unknown. A drawing helps illustrate the relationships. What can you say about the relationship between i, indexLargest, and the data array? Invariants are written as comments, as in the following:

double storage [ ]; ... int position = n – 1; int indexLargest = 0; for (int i = 1; i <= position; i++) { // inv: indexLargest is the index of the largest element in the range 0 .. (i-1) // (see picture) if (storage[i] > storage[indexLargest]) indexLargest = i; // inv: indexLargest is the index of the largest element in the range 0 .. i }