



























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document introduces the basics of combinatorics and probability theory, which are important in computer science. Combinatorics deals with counting and probability theory deals with measuring the likelihood of events. The chapter covers counting assignments, permutations, ordered selections, combinations, permutations with identical items, and ways of distributing objects among bins. The second half of the chapter covers basic concepts of probability theory, conditional probabilities, independence of events, and probabilistic reasoning. The document also discusses some applications of probability theory to computing.
Typology: Lecture notes
1 / 67
This page cannot be seen from the preview
Don't miss anything!




























































In computer science we frequently need to count things and measure the likelihood of events. The science of counting is captured by a branch of mathematics called combinatorics. The concepts that surround attempts to measure the likelihood of events are embodied in a field called probability theory. This chapter introduces the rudiments of these two fields. We shall learn how to answer questions such as how many execution paths are there in a program, or what is the likelihood of occurrence of a given path?
We shall study combinatorics, or “counting,” by presenting a sequence of increas- ingly more complex situations, each of which is represented by a simple paradigm problem. For each problem, we derive a formula that lets us determine the number of possible outcomes. The problems we study are:
F Counting assignments (Section 4.2). The paradigm problem is how many ways can we paint a row of n houses, each in any of k colors.
F Counting permutations (Section 4.3). The paradigm problem here is to deter- mine the number of different orderings for n distinct items.
F Counting ordered selections (Section 4.4), that is, the number of ways to pick k things out of n and arrange the k things in order. The paradigm problem is counting the number of ways different horses can win, place, and show in a horse race.
F Counting the combinations of m things out of n (Section 4.5), that is, the selection of m from n distinct objects, without regard to the order of the selected objects. The paradigm problem is counting the number of possible poker hands.
F Counting permutations with some identical items (Section 4.6). The paradigm problem is counting the number of anagrams of a word that may have some letters appearing more than once. F Counting the number of ways objects, some of which may be identical, can be distributed among bins (Section 4.7). The paradigm problem is counting the number of ways of distributing fruits to children. In the second half of this chapter we discuss probability theory, covering the follow- ing topics: F Basic concepts: probability spaces, experiments, events, probabilities of events. F Conditional probabilities and independence of events. These concepts help us think about how observation of the outcome of one experiment, e.g., the drawing of a card, influences the probability of future events. F Probabilistic reasoning and ways that we can estimate probabilities of com- binations of events from limited data about the probabilities and conditional probabilities of events. We also discuss some applications of probability theory to computing, including systems for making likely inferences from data and a class of useful algorithms that work “with high probability” but are not guaranteed to work all the time.
One of the simplest but most important counting problems deals with a list of items, to each of which we must assign one of a fixed set of values. We need to determine how many different assignments of values to items are possible.
houses in a row, and we may paint each in one of three colors: red, green, or blue. Here, the houses are the “items” mentioned above, and the colors are the “values.” Figure 4.1 shows one possible assignment of colors, in which the first house is painted red, the second and fourth blue, and the third green.
Red Blue Green Blue
Fig. 4.1. One assignment of colors to houses.
To answer the question, “How many different assignments are there?” we first need to define what we mean by an “assignment.” In this case, an assignment is a list of four values, in which each value is chosen from one of the three colors red, green, or blue. We shall represent these colors by the letters R, G, and B. Two such lists are different if and only if they differ in at least one position.
Assignments
Assignments
Assignments
First house Other three houses
Red
Green
Blue
Fig. 4.2. The number of ways to paint 4 houses using 3 colors.
In computer systems, we frequently encounter strings of 0’s and 1’s, and these strings often are used as the names of objects. For example, we may purchase a computer with “64 megabytes of main memory.” Each of the bytes has a name, Bit and that name is a sequence of 26 bits, each of which is either a 0 or 1. The string of 0’s and 1’s representing the name is called a bit string. Why 26 bits for a 64-megabyte memory? The answer lies in an assignment- counting problem. When we count the number of bit strings of length n, we may think of the “items” as the positions of the string, each of which may hold a 0 or a
4.2.1: In how many ways can we paint a) Three houses, each in any of four colors b) Five houses, each in any of five colors c) Two houses, each in any of ten colors
4.2.2: Suppose a computer password consists of eight to ten letters and/or digits. How many different possible passwords are there? Remember that an upper-case letter is different from a lower-case one. 4.2.3*: Consider the function f in Fig. 4.3. How many different values can f return?
int f(int x) { int n;
n = 1; if (x%2 == 0) n *= 2; if (x%3 == 0) n *= 3; if (x%5 == 0) n *= 5; if (x%7 == 0) n *= 7; if (x%11 == 0) n *= 11; if (x%13 == 0) n *= 13; if (x%17 == 0) n *= 17; if (x%19 == 0) n *= 19; return n; }
Fig. 4.3. Function f.
4.2.4: In the game of “Hollywood squares,” X’s and O’s may be placed in any of the nine squares of a tic-tac-toe board (a 3×3 matrix) in any combination (i.e., unlike ordinary tic-tac-toe, it is not necessary that X’s and O’s be placed alternately, so, for example, all the squares could wind up with X’s). Squares may also be blank, i.e., not containing either an X or and O. How many different boards are there? 4.2.5: How many different strings of length n can be formed from the ten digits? A digit may appear any number of times in the string or not at all. 4.2.6: How many different strings of length n can be formed from the 26 lower-case letters? A letter may appear any number of times or not at all.
4.2.7: Convert the following into K’s, M’s, G’s, T’s, or P’s, according to the rules of the box in Section 4.2: (a) 2^13 (b) 2^17 (c) 2^24 (d) 2^38 (e) 2^45 (f) 2^59. 4.2.8*: Convert the following powers of 10 into approximate powers of 2: (a) 10^12 (b) 10^18 (c) 10^99.
In this section we shall address another fundamental counting problem: Given n distinct objects, in how many different ways can we order those objects in a line? Such an ordering is called a permutation of the objects. We shall let Π(n) stand for the number of permutations of n objects. As one example of where counting permutations is significant in computer science, suppose we are given n objects, a 1 , a 2 ,... , an, to sort. If we know nothing about the objects, it is possible that any order will be the correct sorted order, and thus the number of possible outcomes of the sort will be equal to Π(n), the number of permutations of n objects. We shall soon see that this observation helps us argue that general-purpose sorting algorithms require time proportional to n log n, and therefore that algorithms like merge sort, which we saw in Section 3.10 takes
sponding to the two ways in which we may order the remaining objects A and C. We thus have orders BAC and BCA. Finally, if we start with C first, we can order the remaining objects A and B in the two possible ways, giving us orders CAB and CBA. These six orders,
ABC, ACB, BAC, BCA, CAB, CBA
are all the possible orders of three elements. That is, Π(3) = 3 × 2 × 1 = 6. Next, consider how many permutations there are for 4 objects: A, B, C, and D. If we pick A first, we may follow A by the objects B, C, and D in any of their 6 orders. Similarly, if we pick B first, we can order the remaining A, C, and D in any of their 6 ways. The general pattern should now be clear. We can pick any of the four elements first, and for each such selection, we can order the remaining three elements in any of the Π(3) = 6 possible ways. It is important to note that the number of permutations of the three objects does not depend on which three elements they are. We conclude that the number of permutations of 4 objects is 4 times the number of permutations of 3 objects. F
More generally, Π(n + 1) = (n + 1)Π(n) for any n ≥ 1 (4.1)
That is, to count the permutations of n + 1 objects we may pick any of the n + 1 objects to be first. We are then left with n remaining objects, and these can be permuted in Π(n) ways, as suggested in Fig. 4.4. For our example where n + 1 = 4, we have Π(4) = 4 × Π(3) = 4 × 6 = 24.
Π(n) orders
Π(n) orders
Π(n) orders
First object n remaining objects
Object 1
Object 2
Object n + 1
Fig. 4.4. The permutations of n + 1 objects.
Equation (4.1) is the inductive step in the definition of the factorial function intro- duced in Section 2.5. Thus it should not be a surprise that Π(n) equals n!. We can prove this equivalence by a simple induction.
STATEMENT S(n): Π(n) = n! for all n ≥ 1.
BASIS. For n = 1, S(1) says that there is 1 permutation of 1 object. We observed this simple point in Example 4.2.
INDUCTION. Suppose Π(n) = n!. Then S(n + 1), which we must prove, says that Π(n + 1) = (n + 1)!. We start with Equation (4.1), which says that Π(n + 1) = (n + 1) × Π(n)
By the inductive hypothesis, Π(n) = n!. Thus, Π(n + 1) = (n + 1)n!. Since n! = n × (n − 1) × · · · × 1
it must be that (n + 1) × n! = (n + 1) × n × (n − 1) × · · · × 1. But the latter product is (n + 1)!, which proves S(n + 1).
number of permutations of 4 objects is 4! = 4 × 3 × 2 × 1 = 24, as we saw above. As another example, the number of permutations of 7 objects is 7! = 5040. F
One of the interesting uses of the formula for counting permutations is in a proof that sorting algorithms must take at least time proportional to n log n to sort n elements, unless they make use of some special properties of the elements. For example, as we note in the box on special-case sorting algorithms, we can do better than proportional to n log n if we write a sorting algorithm that works only for small integers. However, if a sorting algorithm works on any kind of data, as long as it can be compared by some “less than” notion, then the only way the algorithm can decide on the proper order is to consider the outcome of a test for whether one of two elements is less than the other. A sorting algorithm is called a general- General purpose sorting algorithm if its only operation upon the elements to be sorted is a purpose sorting algorithm
comparison between two of them to determine their relative order. For instance, selection sort and merge sort of Chapter 2 each make their decisions that way. Even though we wrote them for integer data, we could have written them more generally by replacing comparisons like if (A[j] < A[small])
on line (4) of Fig. 2.2 by a test that calls a Boolean-valued function such as if (lessThan(A[j], A[small]))
After making the second decision, the smallest of the three is moved into the first position of the array, and a third comparison is made to determine which of the remaining elements is the larger. That comparison is the last comparison made by the algorithm when three elements are to be sorted. As we see at the bottom of Fig. 4.5, sometimes that decision is determined. For example, if we have already found a < b and c < a, then c is the smallest and the last comparison of a and b must find a smaller.
a < b?
a < c? b < c?
abc, acb, cab bac, bca, cba
b < c? a < b? a < c? a < b?
abc, acb cab bac, bca cba
abc acb cab bac bca cba
abc, acb, bac, bca, cab, cba
Fig. 4.5. Decision tree for selection sorting of 3 elements.
In this example, all paths involve 3 decisions, and at the end there is at most one consistent order, which is the correct sorted order. The two paths with no consistent order never occur. Equation (4.2) tells us that the number of tests t must be at least log 2 3!, which is log 2 6. Since 6 is between 2^2 and 2^3 , we know that log 2 6 will be between 2 and 3. Thus, at least some sequences of outcomes in any algorithm that sorts three elements must make 3 tests. Since selection sort makes only 3 tests for 3 elements, it is at least as good as any other sorting algorithm for 3 elements in the worst case. Of course, as the number of elements becomes large, we know that selection sort is not as good as can be done, since it is an O(n^2 ) sorting algorithm and there are better algorithms such as merge sort. F
We must now estimate how large log 2 n! is. Since n! is the product of all the integers from 1 to n, it is surely larger than the product of only the n 2 + 1 integers from n/2 through n. This product is in turn at least as large as n/2 multiplied by itself n/2 times, or (n/2)n/^2. Thus, log 2 n! is at least log 2
(n/2)n/^2
. But the latter is n 2 (log 2 n − log 2 2), which is
n 2 (log 2 n − 1)
For large n, this formula is approximately (n log 2 n)/2. A more careful analysis will tell us that the factor of 1/2 does not have to be there. That is, log 2 n! is very close to n log 2 n rather than to half that expression.
If we restrict the inputs on which a sorting algorithm will work, it can in one step divide the possible orders into more than 2 parts and thus work in less than time proportional to n log n. Here is a simple example that works if the input is n distinct integers, each chosen in the range 0 to 2n − 1.
(1) for (i = 0; i < 2n; i++) (2) count[i] = 0; (3) for (i = 0; i < n; i++) (4) count[a[i]]++; (5) for (i = 0; i < 2n; i++) (6) if (count[i] > 0) (7) printf("%d\n", i); We assume the input is in an array a of length n. In lines (1) and (2) we initialize an array count of length 2n to 0. Then in lines (3) and (4) we add 1 to the count for x if x is the value of a[i], the ith input element. Finally, in the last three lines we print each of the integers i such that count[i] is positive. Thus we print those elements appearing one or more times in the input and, on the assumption the inputs are distinct, it prints all the input elements, sorted smallest first. We can analyze the running time of this algorithm easily. Lines (1) and (2) are a loop that iterates 2n times and has a body taking O(1) time. Thus, it takes O(n) time. The same applies to the loop of lines (3) and (4), but it iterates n times rather than 2n times; it too takes O(n) time. Finally, the body of the loop of lines (5) through (7) takes O(1) time and it is iterated 2n times. Thus, all three loops take O(n) time, and the entire sorting algorithm likewise takes O(n) time. Note that if given an input for which the algorithm is not tailored, such as integers in a range larger than 0 through 2n − 1, the program above fails to sort correctly.
We have shown only that any general-purpose sorting algorithm must have some input for which it makes about n log 2 n comparisons or more. Thus any general-purpose sorting algorithm must take at least time proportional to n log n in the worst case. In fact, it can be shown that the same applies to the “average” input. That is, the average over all inputs of the time taken by a general-purpose sorting algorithm must be at least proportional to n log n. Thus, merge sort is about as good as we can do, since it has this big-oh running time for all inputs.
4.3.1: Suppose we have selected 9 players for a baseball team.
a) How many possible batting orders are there? b) If the pitcher has to bat last, how many possible batting orders are there?
4.3.2: How many comparisons does the selection sort algorithm of Fig. 2.2 make if there are 4 elements? Is this number the best possible? Show the top 3 levels of the decision tree in the style of Fig. 4.5.
1 All but 1
2 All but 2^2 4 5
10 All but 10
All but 2, 3
All but 3, 4
All but 3, 10
Fig. 4.6. Ordered selection of three things out of 10.
n! (n − m)!
n(n − 1) · · · (n − m + 1)(n − m)(n − m − 1) · · · (1) (n − m)(n − m − 1) · · · (1) The denominator is the product of the integers from 1 to n−m. The numerator is the product of the integers from 1 to n. Since the last n − m factors in the numerator and denominator above are the same, (n − m)(n − m − 1) · · · (1), they cancel and the result is that n! (n − m)!
= n(n − 1) · · · (n − m + 1)
This formula is the same as that in (4.3), which shows that Π(n, m) = n!/(n − m)!.
The problem considered in Example 4.5 differs only slightly from the assignment problem considered in Section 4.2. In terms of houses and colors, we could almost see the selection of the first three finishing horses as an assignment of one of ten horses (the “colors”) to each of three finishing positions (the “houses”). The only difference is that, while we are free to paint several houses the same color, it makes no sense to say that one horse finished both first and third, for example. Thus, while the number of ways to color three houses in any of ten colors is 10^3 or 10 × 10 × 10, the number of ways to select the first three finishers out of 10 is 10 × 9 × 8. We sometimes refer to the kind of selection we did in Section 4.2 as selection Selection with with replacement. That is, when we select a color, say red, for a house, we “replace” replacement red into the pool of possible colors. We are then free to select red again for one or more additional houses. On the other hand, the sort of selection we discussed in Example 4.5 is called Selection selection without replacement. Here, if the horse Sea Biscuit is selected to be the without replacement
winner, then Sea Biscuit is not replaced in the pool of horses that can place or show. Similarly, if Secretariat is selected for second place, he is not eligible to be the third-place horse also.
The factors from 1 through 7 appear in both numerator and denominator and thus cancel. The result is the product of the integers from 8 through 10, or 10 × 9 × 8, as we saw in Example 4.5. F
4.4.1: How many ways are there to form a sequence of m letters out of the 26 letters, if no letter is allowed to appear more than once, for (a) m = 3 (b) m = 5.
4.4.2: In a class of 200 students, we wish to elect a President, Vice President, Secretary, and Treasurer. In how many ways can these four officers be selected? 4.4.3: Compute the following quotients of factorials: (a) 100!/97! (b) 200!/195!.
Mastermind 4.4.4: The game of Mastermind requires players to select a “code” consisting of a sequence of four pegs, each of which may be of any of six colors: red, green, blue, yellow, white, and black.
a) How may different codes are there? b) How may different codes are there that have two or more pegs of the same color? Hint: This quantity is the difference between the answer to (a) and another easily computed quantity. c) How many codes are there that have no red peg? d) How many codes are there that have no red peg but have at least two pegs of the same color?
to count only the sets of three horses that may be the three top finishers, we must divide Π(10, 3) by 6. Thus, there are 720/6 = 120 different sets of three horses out of 10. F
is dealt five cards from a 52-card deck. We do not care in what order the five cards are dealt, just what five cards we have. To count the number of sets of five cards we may be dealt, we could start by calculating Π(52, 5), which is the number of ordered selections of five objects out of 52. This number is 52!/(52 − 5)!, which is 52!/47!, or 52 × 51 × 50 × 49 × 48 = 311,875,200. However, just as the three fastest horses in Example 4.7 appear in 3! = 6 different orders, any set of five cards to appear in Π(5) = 5! = 120 different orders. Thus, to count the number of poker hands without regard to order of selection, we must take the number of ordered selections and divide by 120. The result is 311,875,200/120 = 2,598,960 different hands. F
Let us now generalize Examples 4.7 and 4.8 to get a formula for the number of ways to select m items out of n without regard to order of selection. This function is usually written
( (^) n m
and spoken “n choose m” or “combinations of m things out of n.” To compute
( (^) n m
Combinations of , we start with Π(n, m) =^ n!/(n^ −^ m)!, the number of ordered m things out of n
selections of m things out of n. We then group these ordered selections according to the set of m items selected. Since these m items can be ordered in Π(m) = m! different ways, the groups will each have m! members. We must divide the number of ordered selections by m! to get the number of unordered selections. That is, ( n m
Π(n, m) Π(m)
n! (n − m)! × m!
m = 5. We have
5
= 52!/(47! × 5!). If we cancel the 47! with the last 47 factors of 52! and expand 5!, we can write ( 52 5
Simplifying, we get
5
If we think recursively about the number of ways to select m items out of n, we can develop a recursive algorithm to compute
( (^) n m
BASIS.
(n 0
= 1 for any n ≥ 1. That is, there is only one way to pick zero things out of n: pick nothing. Also,
(n n
= 1; that is, the only way to pick n things out of n is to pick them all.
INDUCTION. If 0 < m < n, then
( (^) n m
(n− 1 m
( (^) n− 1 m− 1
. That is, if we wish to pick m things out of n, we can either
i) Not pick the first element, and then pick m things from the remaining n − 1 elements. The term
(n− 1 m
counts this number of possibilities.
or
ii) Pick the first element and then select m − 1 things from among the remaining n − 1 elements. The term
(n− 1 m− 1
counts these possibilities.
Incidently, while the idea of the induction should be clear — we proceed from the simplest cases of picking all or none to more complicated cases where we pick some but not all — we have to be careful to state what quantity the induction is “on.” One way to look at this induction is that it is a complete induction on the product of n and the minimum of m and n − m. Then the basis case occurs when this product is 0 and the induction is for larger values of the product. We have to check for the induction that n × min(m, n − m) is always greater than (n − 1) × min(m, n − m − 1) and (n − 1) × min(m − 1 , n − m) when 0 < m < n. This check is left as an exercise. Pascal’s triangle This recursion is often displayed by Pascal’s triangle, illustrated in Fig. 4.8, where the borders are all 1’s (for the basis) and each interior entry is the sum of the two numbers above it to the northeast and northwest (for the induction). Then( n m
can be read from the (m + 1)st entry of the (n + 1)st row.
Fig. 4.8. The first rows of Pascal’s triangle.
of
2
in the 3rd entry of the 5th row of Fig. 4.8. This entry is 6, and it is easy to check that
2
The two ways we have to compute
( (^) n m
— by formula (4.4) or by the above recursion — each compute the same value, naturally. We can argue so by appeal to physical reasoning. Both methods compute the number of ways to select m items out of n in an unordered fashion, so they must produce the same value. However, we can also prove the equality of the two approaches by an induction on n. We leave this proof as an exercise.
( (^) n m
It may not be obvious why the quotients of many factors in Equations (4.4), (4.5), or (4.6) must always turn out to be an integer. The only simple argument is to appeal to physical reasoning. The formulas all compute the number of ways to choose m things out of n, and this number must be some integer. It is much harder to argue this fact from properties of integers, without appeal- ing to the physical meaning of the formulas. It can in fact be shown by a careful analysis of the number of factors of each prime in numerator and denominator. As a sample, look at the expression in Example 4.9. There is a 5 in the denominator, and there are 5 factors in the numerator. Since these factors are consecutive, we know one of them must be divisible by 5; it happens to be the middle factor, 50. Thus, the 5 in the denominator surely cancels.
Thus, we need to convert to floating-point numbers before doing any calculation. We leave this modification as an exercise. Now, let us consider the recursive algorithm to compute
( (^) n m
. We can imple- ment it by the simple recursive function of Fig. 4.10.
/* compute n choose m for 0 <= m <= n */ int choose(int n, int m) { int n, m;
(1) if (m < 0 || m > n) {/* error conditions / (2) printf("invalid input\n"); (3) return 0; } (4) else if (m == 0 || m == n) / basis case / (5) return 1; else / induction */ (6) return (choose(n-1, m-1) + choose(n-1, m)); }
Fig. 4.10. Recursive function to compute
( (^) n m
.
The function of Fig. 4.10 is not efficient; it creates an exponential explosion in the number of calls to choose. The reason is that when called with n as its first argument, it usually makes two recursive calls at line (6) with first argument n − 1. Thus, we might expect the number of calls made to double when n increases by 1. Unfortunately, the exact number of recursive calls made is harder to count. The reason is that the basis case on lines (4) and (5) can apply not only when n = 1, but for higher n, provided m has the value 0 or n. We can prove a simple, but slightly pessimistic upper bound as follows. Let T (n) be the running time of Fig. 4.10 with first argument n. We can prove that T (n) is O(2n) simply. Let a be the total running time of lines (1) through (5), plus
that part of line (6) that is involved in the calls and return, but not the time of the recursive calls themselves. Then we can prove by induction on n:
STATEMENT S(n): If choose is called with first argument n and some second argument m between 0 and n, then the running time T (n) of the call is at most a(2n^ − 1).
BASIS. n = 1. Then it must be that either m = 0 or m = 1 = n. Thus, the basis case on lines (4) and (5) applies and we make no recursive calls. The time for lines (1) through (5) is included in a. Since S(1) says that T (1) is at most a(2^1 − 1) = a, we have proved the basis.
INDUCTION. Assume S(n); that is, T (n) ≤ a(2n^ − 1). To prove S(n + 1), suppose we call choose with first argument n + 1. Then Fig. 4.10 takes time a plus the time of the two recursive calls on line (6). By the inductive hypothesis, each call takes at most time a(2n^ − 1). Thus, the total time consumed is at most a + 2a(2n^ − 1) = a(1 + 2n+1^ − 2) = a(2n+1^ − 1)
This calculation proves S(n + 1) and proves the induction.
We have thus proved that T (n) ≤ a(2n^ − 1). Dropping the constant factor and the low-order terms, we see that T (n) is O(2n). Curiously, while in our analyses of Chapter 3 we easily proved a smooth and tight upper bound on running time, the O(2n) bound on T (n) is smooth but not tight. The proper smooth, tight upper bound is slightly less: O(2n/
n). A proof of this fact is quite difficult, but we leave as an exercise the easier fact that the running time of Fig. 4.10 is proportional to the value it returns:
( (^) n m
. An important observation is that the recursive algorithm of Fig. 4.10 is much less efficient that the linear algorithm of Fig. 4.9. This example is one where recursion hurts considerably.
( (^) n m
For a fixed value of n, the function of m that is
( (^) n m
has a number of interesting properties. For a large value of n, its form is the bell-shaped curve suggested Bell curve in Fig. 4.11. We immediately notice that this function is symmetric around the midpoint n/2; this is easy to check using formula (4.4) that states
( (^) n m
( (^) n n−m
The maximum height at the center, that is,
( (^) n n/ 2
, is approximately 2n/
πn/2. For example, if n = 10, this formula gives 258.37, while
5
The “thick part” of the curve extends for approximately
n on either side of the midpoint. For example, if n = 10, 000, then for m between 4900 and 5100 the value of
m
is close to the maximum. For m outside this range, the value of ( 10 , 000 m
falls off very rapidly.