




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This book is all about DS and Algo
Typology: Study notes
1 / 492
This page cannot be seen from the preview
Don't miss anything!





























































































Data Structures and Algorithm Analysis
in C
This book describes data structures, methods of organizing large amounts of data,
and algorithm analysis, the estimation of the running time of algorithms. As
because the and (&&) operation is short-circuited.
if (x=y)
Return to Table of Contents Next Chapter
context, a short discussion on complexity theory (including NP-completeness and
discussion of NP-completeness in Chapter 9 is far too brief to be used in such a
course. Garey and Johnson's book on NP-completeness can be used to augment this
M.A.W.
Miami, Florida
September 1992
direction. As an example, the puzzle shown in Figure 1.1 contains the words this,
two, fat, and that. The word this begins at row 1, column 1 (1,1) and extends to
(1, 4); two goes from (1, 1) to (3, 1); fat goes from (4, 1) to (2, 3); and that
For each word in the word list, we check each ordered triple (row, column,
orientation) for the presence of the word. This amounts to lots of nested for
Alternatively, for each ordered quadruple (row, column, orientation, number of
characters) that doesn't run off an end of the puzzle, we can test whether the
word indicated is in the word list. Again, this amounts to lots of nested for
1 t h i s
2 w a t s
3 o a h g
4 f g d t
1.2.1. Exponents
x a^ x b^ = x a+b
xa
-- = xa-b
xb
(xa)b = xab
xn + xn = 2xn^ x2n
2 n + 2n = 2n+
In computer science, all logarithms are to base 2 unless specified otherwise.
DEFINITION:xa =b if and only if logx b =a
Several convenient equalities follow from this definition.
Letx = logc b, y = logc a, andz = loga b. Then, by the definition of logarithms,cx =b, cy =
a, andaz =b. Combining these three equalities yields (cy)z =cx =b. Therefore,x =yz, which impliesz =x/y, proving the theorem.
logab = loga + logb
Letx = loga, y = logb, z = logab. Then, assuming the default base of 2, 2x=a, 2y =b, 2z =
ab. Combining the last three equalities yields 2x 2 y = 2z =ab. Therefore,x +y = z, which proves the theorem.
Some other useful formulas, which can all be derived in a similar manner, follow.
loga/b = loga - logb
and multiply by 2, obtaining
Subtracting these two equations yields
Thus,S = 2.
Another type of common series in analysis is the arithmetic series. Any such series can be
evaluated from the basic formula.
For instance, to find the sum 2 + 5 + 8 +... + (3k - 1), rewrite it as 3(1 + 2+ 3 +... +k) -
(1 + 1 + 1 +... + 1), which is clearly 3k(k + 1)/2 -k. Another way to remember this is to add the first and last terms (total 3k + 1), the second and next to last terms (total 3k + 1), and so on. Since there arek/2 of these pairs, the total sum isk(3k + 1)/2, which is the same answer as before.
The next two formulas pop up now and then but are fairly infrequent.
Whenk = -1, the latter formula is not valid. We then need the following formula, which is used
far more in computer science than in other mathematical disciplines. The numbers,HN, are known
as the harmonic numbers, and the sum is known as a harmonic sum. The error in the following
approximation tends toy 0.57721566, which is known asEuler's constant.
These two formulas are just general algebraic manipulations.
We say thata is congruent tob modulon, writtena b(modn), ifn dividesa -b.
Intuitively, this means that the remainder is the same when eithera orb is divided byn. Thus,
81 61 1(mod 10). As with equality, ifa b (modn), thena +c b +c(modn)
anda d b d (modn).
There are a lot of theorems that apply to modular arithmetic, some of which require extraordinary proofs in number theory. We will use modular arithmetic sparingly, and the preceding theorems will suffice.
The two most common ways of proving statements in data structure analysis are proof by induction and proof by contradiction (and occasionally a proof by intimidation, by professors only). The best way of proving that a theorem is false is by exhibiting a counterexample.
A proof by induction has two standard parts. The first step is proving abase case, that is,
establishing that a theorem is true for some small (usually degenerate) value(s); this step is almost always trivial. Next, aninductive hypothesis is assumed. Generally this means that the theorem is assumed to be true for all cases up to some limitk. Using this assumption, the theorem is then shown to be true for the next value, which is typicallyk + 1. This proves the theorem (as long ask is finite).
As an example, we prove that the Fibonacci numbers,F 0 = 1,F 1 = 1,F 2 = 2,F 3 = 3,F 4 = 5,...
,Fi =Fi-1 +Fi-2, satisfyFi < (5/3)i, fori 1. (Some definitions haveF0 = (^0) , which
shifts the series.) To do this, we first verify that the theorem is true for the trivial cases.
It is easy to verify thatF 1 = 1 < 5/3 andF 2 = 2 <25/9; this proves the basis. We assume that
the theorem is true fori = 1, 2,... ,k; this is the inductive hypothesis. To prove the
theorem, we need to show thatFk+1 < (5/3)k+1. We have
Fk + 1=Fk +Fk-
by the definition, and we can use the inductive hypothesis on the right-hand side, obtaining
Fk+1 < (5/3)k + (5/3)k-
< (3/5)(5/3)k+1 + (3/5)^2 (5/3)k+
The statementFk k^2 is false. The easiest way to prove this is to computeF 11 = 144 > 11^2.
Proof by contradiction proceeds by assuming that the theorem is false and showing that this
assumption implies that some known property is false, and hence the original assumption was erroneous. A classic example is the proof that there is an infinite number of primes. To prove this, we assume that the theorem is false, so that there is some largest primepk. Letp 1 ,p 2 ,.
.. ,pk be all the primes in order and consider
N =p 1 p 2 p 3.. .pk + 1
Clearly,N is larger thanpk, so by assumptionN is not prime. However, none ofp 1 ,p 2 ,... ,
pk divideN exactly, because there will always be a remainder of 1. This is a contradiction,
because every number is either prime or a product of primes. Hence, the original assumption, that pk is the largest prime, is false, which implies that the theorem is true.
int
f( int x )
{
/1/ if ( x = 0 )
/2/ return 0;
else
/3/ return( 2f(x-1) + xx );
}
Figure 1.2 A recursive function
1.3. A Brief Introduction to Recursion
Most mathematical functions that we are familiar with are described by a simple formula. For
instance, we can convert temperatures from Fahrenheit to Celsius by applying the formula
C = 5(F - 32)/
Given this formula, it is trivial to write a C function; with declarations and braces removed, the one-line formula translates to one line of C.
Mathematical functions are sometimes defined in a less standard form. As an example, we can define a functionf, valid on nonnegative integers, that satisfiesf(0) = 0 andf(x) = 2f(x - 1)
+x^2. From this definition we see thatf(1) = 1,f(2) = 6,f(3) = 21, andf(4) = 58. A function that is defined in terms of itself is calledrecursive. C allows functions to be recursive.* It is important to remember that what C provides is merely an attempt to follow the recursive spirit. Not all mathematically recursive functions are efficiently (or correctly) implemented by C's simulation of recursion. The idea is that the recursive functionf ought to be expressible in
only a few lines, just like a non-recursive function. Figure 1.2 shows the recursive implementation off.
*Using recursion for numerical calculations is usually a bad idea. We have done so to illustrate the basic points.
Lines 1 and 2 handle what is known as thebase case, that is, the value for which the function is
directly known without resorting to recursion. Just as declaringf(x) = 2f(x - 1) +x^2 is meaningless, mathematically, without including the fact thatf (0) = 0, the recursive C function doesn't make sense without a base case. Line 3 makes the recursive call.
There are several important and possibly confusing points about recursion. A common question is: Isn't this just circular logic? The answer is that although we are defining a function in terms of itself, we are not defining a particular instance of the function in terms of itself. In other words, evaluatingf(5) by computingf(5) would be circular. Evaluatingf(5) by computingf(4) is not circular--unless, of coursef(4) is evaluated by eventually computingf(5). The two most important issues are probably thehow andwhy questions. In Chapter 3, thehow andwhy issues are formally resolved. We will give an incomplete description here.
It turns out that recursive calls are handled no differently from any others. Iff is called with the value of 4, then line 3 requires the computation of 2 *f(3) + 4 * 4. Thus, a call is made to
computef(3). This requires the computation of 2 *f(2) + 3 * 3. Therefore, another call is made
to computef(2). This means that 2 *f(1) + 2 * 2 must be evaluated. To do so,f(1) is computed
as 2 *f(0) + 1 * 1. Now,f(0) must be evaluated. Since this is a base case, we know a priori
thatf(0) = 0. This enables the completion of the calculation forf(1), which is now seen to be
These considerations lead to the first two fundamental rules of recursion:
1.Base cases. You must always have some base cases, which can be solved without recursion.
2.Making progress. For the cases that are to be solved recursively, the recursive call must always be to a case that makes progress toward a base case.
Throughout this book, we will use recursion to solve problems. As an example of a nonmathematical use, consider a large dictionary. Words in dictionaries are defined in terms of other words. When we look up a word, we might not always understand the definition, so we might have to look up words in the definition. Likewise, we might not understand some of those, so we might have to continue this search for a while. As the dictionary is finite, eventually either we will come to a point where we understand all of the words in some definition (and thus understand that definition and retrace our path through the other definitions), or we will find that the definitions are circular and we are stuck, or that some word we need to understand a definition is not in the dictionary.
The recursive number-printing algorithm is correct for n 0.
First, ifn has one digit, then the program is trivially correct, since it merely makes a call to print_digit. Assume then thatprint_out works for all numbers ofk or fewer digits. A number ofk
number formed by the firstk digits is exactly n /10 , which, by the indicated hypothesis is correctly printed, and the last digit isn mod10, so the program prints out any (k + 1)-digit number correctly. Thus, by induction, all numbers are correctly printed.
void
print_out( unsigned int n ) /* print nonnegative n */
{
if( n<10 )
print_digit( n );
else
{
print_out( n/10 );
print_digit( n%10 );
}
}
Figure 1.4 Recursive routine to print an integer
This proof probably seems a little strange in that it is virtually identical to the algorithm description. It illustrates that in designing a recursive program, all smaller instances of the same problem (which are on the path to a base case) may beassumed to work correctly. The recursive program needs only to combine solutions to smaller problems, which are "magically" obtained by recursion, into a solution for the current problem. The mathematical justification for this is proof by induction. This gives the third rule of recursion:
3.Design rule. Assume that all the recursive calls work.
This rule is important because it means that when designing recursive programs, you generally don't need to know the details of the bookkeeping arrangements, and you don't have to try to trace through the myriad of recursive calls. Frequently, it is extremely difficult to track down the actual sequence of recursive calls. Of course, in many cases this is an indication of a good use of recursion, since the computer is being allowed to work out the complicated details.
The main problem with recursion is the hidden bookkeeping costs. Although these costs are almost always justifiable, because recursive programs not only simplify the algorithm design but also tend to give cleaner code, recursion should never be used as a substitute for a simplefor loop. We'll discuss the overhead involved in recursion in more detail in Section 3.3.
When writing recursive routines, it is crucial to keep in mind the four basic rules of recursion:
1.Base cases. You must always have some base cases, which can be solved without recursion.
2.Making progress. For the cases that are to be solved recursively, the recursive call must always be to a case that makes progress toward a base case.
3.Design rule. Assume that all the recursive calls work.
4.Compound interest rule. Never duplicate work by solving the same instance of a problem in separate recursive calls.
The fourth rule, which will be justified (along with its nickname) in later sections, is the
reason that it is generally a bad idea to use recursion to evaluate simple mathematical functions, such as the Fibonacci numbers. As long as you keep these rules in mind, recursive programming should be straightforward.
Summary
This chapter sets the stage for the rest of the book. The time taken by an algorithm confronted with large amounts of input will be an important criterion for deciding if it is a good algorithm. (Of course, correctness is most important.) Speed is relative. What is fast for one problem on one machine might be slow for another problem or a different machine. We will begin to address these issues in the next chapter and will use the mathematics discussed here to establish a formal model.
Exercises
1.1 Write a program to solve the selection problem. Letk =n/2. Draw a table showing the running
time of your program for various values ofn.
1.2 Write a program to solve the word puzzle problem.
1.3 Write a procedure to output an arbitrary real number (which might be negative) using only
print_digit for I/O.
1.4 C allows statements of the form
#includefilename
which readsfilename and inserts its contents in place of theinclude statement.Include statements may be nested; in other words, the filefilename may itself contain aninclude statement, but, obviously, a file can't include itself in any chain. Write a program that reads in a file and outputs the file as modified by theinclude statements.
1.5 Prove the following formulas:
a. logx
b. log(ab) =b loga
1.6 Evaluate the following sums:
General programming style is discussed in several books. Some of the classics are [5], [7], and [9].
Mass., 1989.
2d ed., Benjamin Cummings Publishing, Menlo Park, Calif., 1988.
New York, 1978.
Englewood Cliffs, N.J., 1988.
Addison-Wesley, Reading, Mass., 1973.
Go to Chapter 2 Return to Table of Contents
An algorithm is a clearly specified set of simple instructions to be followed to
DEFINITION: T(n) = O(f(n)) if there are constants c and n 0 such that T(n) cf
(n) when n n 0.
DEFINITION: T(n) = (g(n)) if there are constants c and n 0 such that T(n)
cg(n) when n n 0.
DEFINITION: T(n) = (h(n)) if and only if T(n) = O(h(n)) and T(n) = (h(n)).
DEFINITION: T(n) = o(p(n)) if T(n) = O(p(n)) and T(n) (p(n)).
Previous Chapter Return to Table of Contents Next Chapter