Chapter 11 Bernoulli Trials, Exercises of Statistics

Definition 11.1 (The assumptions of Bernoulli trials.)​​ Each trial results in one of two possible outcomes, denoted success (S) or failure (F). 2. The ...

Typology: Exercises

2021/2022

Uploaded on 09/12/2022

kitriotak
kitriotak 🇮🇳

4.5

(13)

220 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 11
Bernoulli Trials
11.1 The Binomial Distribution
In the previous chapter, we learned about i.i.d. trials. Recall that there are three ways we can have
i.i.d. trials:
1. Our units are trials and we have decided to assume that they are i.i.d.
2. We have a finite population and we will select our sample of its members at random with
replacement—the dumb form of random sampling. The result is that we have i.i.d. random
variables which means the same thing as having i.i.d. trials.
3. We have a finite population and we have selected our sample of its members at random
without replacement—the smart form of random sampling. If n/N—the ratio of sample
size to population size—is 0.05 or smaller, then we get a good approximation if we treat our
random variables as i.i.d.
In this chapter, we study a very important special case of i.i.d. trials, called Bernoulli trials. If each
trial has exactly two possible outcomes, then we have Bernoulli trials. For convenient reference, I
will now explicitly state the assumptions of Bernoulli trials.
Definition 11.1 (The assumptions of Bernoulli trials.) If we have a collection of trials that sat-
isfy the three conditions below, then we say that we have Bernoulli trials.
1. Each trial results in one of two possible outcomes, denoted success (S) or failure (F).
2. The probability of a success remains constant from trial-to-trial and is denoted by p. Write
q= 1 pfor the constant probability of a failure.
3. The trials are independent.
We will use the method described on page 168 of Chapter 8 to assign the labels success and failure.
When we are involved in mathematical arguments, it will be convenient to represent a success by
the number 1 and a failure by the number 0. Finally,
255
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Chapter 11 Bernoulli Trials and more Exercises Statistics in PDF only on Docsity!

Chapter 11

Bernoulli Trials

11.1 The Binomial Distribution

In the previous chapter, we learned about i.i.d. trials. Recall that there are three ways we can have i.i.d. trials:

  1. Our units are trials and we have decided to assume that they are i.i.d.
  2. We have a finite population and we will select our sample of its members at random with replacement—the dumb form of random sampling. The result is that we have i.i.d. random variables which means the same thing as having i.i.d. trials.
  3. We have a finite population and we have selected our sample of its members at random without replacement—the smart form of random sampling. If n/N—the ratio of sample size to population size—is 0.05 or smaller, then we get a good approximation if we treat our random variables as i.i.d.

In this chapter, we study a very important special case of i.i.d. trials, called Bernoulli trials. If each trial has exactly two possible outcomes, then we have Bernoulli trials. For convenient reference, I will now explicitly state the assumptions of Bernoulli trials.

Definition 11.1 (The assumptions of Bernoulli trials.) If we have a collection of trials that sat- isfy the three conditions below, then we say that we have Bernoulli trials.

1. Each trial results in one of two possible outcomes, denoted success ( S ) or failure ( F _).

  1. The probability of a success remains constant from trial-to-trial and is denoted by_ p_. Write_ q = 1 − p _for the constant probability of a failure.
  2. The trials are independent._

We will use the method described on page 168 of Chapter 8 to assign the labels success and failure. When we are involved in mathematical arguments, it will be convenient to represent a success by the number 1 and a failure by the number 0. Finally,

We are not interested in either of the trivial cases in which p = 0 or p = 1. Thus, we restrict attention to situations in which 0 < p < 1.

One reason that Bernoulli trials are so important, is that if we have Bernoulli trials, we can calculate probabilities of a great many events. Our first tool for calculation is the multiplication rule that we learned in Chapter 10. For example, suppose that we have n = 5 Bernoulli trials with p = 0. 70. The probability that the Bernoulli trials yield four successes followed by a failure is:

P (SSSSF ) = ppppq = (0.70)^4 (0.30) = 0. 0720.

Our next tool is extremely powerful and very useful in science. It is the binomial probability distribution. Suppose that we plan to perform/observe n Bernoulli trials. Let X denote the total number of successes in the n trials. The probability distribution of X is given by the following equation.

P (X = x) =

n! x!(n − x)!

pxqn−x, for x = 0, 1 ,... , n. (11.1)

Equation 11.1 is called the binomial probability distribution with parameters n and p; it is denoted by the Bin(n, p) distribution. I will illustrate the use of this equation below, a compilation of the Bin(5,0.60) distribution. I replace n with 5, p with 0. 60 and q with (1 − p) = 0. 40. Below, I will evaluate Equation 11.1 six times, for x = 0, 1 ,... , 5 : You should check a couple of the following computations to make sure you are comfortable using Equation 11.1, but you don’t need to verify all of them.

P (X = 0) =

(0.60)^0 (0.40)^5 = 1(1)(0.01024) = 0. 01024.

P (X = 1) =

(0.60)^1 (0.40)^4 = 5(0.60)(0.0256) = 0. 07680.

P (X = 2) =

(0.60)^2 (0.40)^3 = 10(0.36)(0.064) = 0. 23040.

P (X = 3) =

(0.60)^3 (0.40)^2 = 10(0.216)(0.16) = 0. 34560.

P (X = 4) =

(0.60)^4 (0.40)^1 = 5(0.1296)(0.40) = 0. 25920.

P (X = 5) =

(0.60)^5 (0.40)^0 = 1(0.07776)(1) = 0. 07776.

Whenever probabilities for a random variable X are given by Equation 11.1 we say that X has a binomial probability (sampling) distribution with parameters n and p and write this as X ∼ Bin(n, p). There are a number of difficulties that arise when one attempts to use the binomial probability distribution. The most obvious is that each trial needs to give a dichotomous response. Sometimes it is obvious that we have a dichotomy: For example, if my trials are shooting free throws or attempting golf putts, then the natural response is that a trial results in a make or miss. Other

Table 11.1: The output, after rounding, from the binomial website for p = 0. 75 , n = 8 and x = 6.

Binomial Probability: P (X = 6) = 0. 3115 Cumulative Probability: P (X < 6) = 0. 3215 Cumulative Probability: P (X ≤ 6) = 0. 6329 Cumulative Probability: P (X > 6) = 0. 3671 Cumulative Probability: P (X ≥ 6) = 0. 6785

I will now show you how to use this website. The website requires you to enter three numbers:

  • Probability of success on a single trial : Enter the value of p; for our current problem, I enter p = 0. 75.
  • Number of trials : Enter the value of n; for our current problem, I enter n = 8.
  • Number of successes ( x ) : This is a bit tricky to explain explicitly, but once you see one example, you will understand how to do it. Because my event of interest, (X ≥ 6), involves the number 6, I enter x = 6.

After entering my values for p, n and x, I click on the Calculate box and obtain the output, rounded to four digits, printed in Table 11.1. The answer I want is the fifth entry in the list:

P (X ≥ 6) = 0. 6785.

Note that there is a great deal of redundancy in five answers in this list. Make sure you understand why the following identities are true:

  • The third probability is the sum of the first two.
  • The fifth probability is the sum of the first and the fourth.
  • The sum of the second and the fifth probabilities equals 1.
  • The sum of the third and the fourth probabilities equals 1.

For example, the third probability P (X ≤ 6) can be written as P (X < 6) + P (X = 6). In the listing above, this becomes 0 .6329 = 0.3115 + 0. 3215 which is correct except for round-off error. The website is good for computing individual probabilities, but it is tedious to use it to generate an entire binomial distribution. For the latter objective, I use Minitab. In particular, with the help of Minitab, I obtained the Bin(8,0.75) distribution, displayed in Table 11.2. Literally, the first two columns of this table present the sampling distribution. It’s easy to have the computer create the cumulative sums in the third and fourth columns, so I have included them. From the table we can find the five probabilities given by the website. For example,

P (X > 6) = P (X ≥ 7) = 0. 3671 , from the fourth column.

Table 11.2: The binomial distribution with n = 8 and p = 0. 75.

x P (X = x) P (X ≤ x) P (X ≥ x) 0 0. 0000 0. 0000 1. 0000 1 0. 0004 0. 0004 1. 0000 2 0. 0038 0. 0042 0. 9996 3 0. 0231 0. 0273 0. 9958 4 0. 0865 0. 1138 0. 9727 5 0. 2076 0. 3215 0. 8862 6 0. 3115 0. 6329 0. 6785 7 0. 2670 0. 8999 0. 3671 8 0. 1001 1. 0000 0. 1001 Total 1. 0000 — —

11.1.1 Computational Difficulties

By trial-and-error, I discovered that if I go to Minitab and ask for the Bin(n,0.50) distribution with n ≥ 1023 , then I am given the following message:

  • ERROR * Completion of computation impossible.

If, however, p = 0. 50 and n ≤ 1022 , then Minitab gives an answer. Similarly, Minitab reports its error message for Bin(n,0.60) if, and only if, n ≥ 1388. The people who wrote Minitab are good programmers. I am not a very good programmer, but I could write a program that would handle at least some of the situations Minitab does not. How can this be? Well, as we will learn later, if n is large enough, then we can use a either a Normal curve or the Poisson distribution (see Chapter 13) to obtain good approximations to binomial probabilities. Thus, my inference is that the Minitab programmers were somewhat casual in writing their code because they knew that their users could opt for an approximation. If you read through the exposition on the website

http://stattrek.com/Tables/Binomial.aspx,

near the bottom you will find the following:

When the number of trials is greater than 1,000, the Binomial Calculator uses a Normal distribution to estimate the binomial probabilities.

Very soon I will give you the details of the Normal curve approximation. Here is my advice for this course. You can trust the website’s answers provided n ≤ 1000. Do not use it for n > 1000 until you have read my discussion of the Normal approximation in the next section.

Figure 11.4: The Bin(50, 0.1) Distribution.

11.2 The Normal Curve Approximation to the Binomial

Recall that we learned how to draw a probability histogram on page 143 in Chapter 7. Figures 11.1–11.4 present probability histograms for several binomial probability distributions. Because δ = 1 the area of each rectangle equals its height; thus, the probability of any integer value of x is the height of the rectangle centered at x. As discussed in Chapter 7, a probability histogram allows us to ‘see’ a probability distribution. For example, for the four probability histograms that are presented above, the two with p = 0. 50 are symmetric; the one with n = 100 and p = 0. 2 is almost symmetric; and the one with n = 50 and p = 0. 1 deviates a great deal from symmetry. Indeed, it can be shown that a binomial distribution is symmetric if, and only if, p = 0. 50. Moreover, for p 6 = 0. 5 , if both np and nq are far from 0 then the binomial distribution is almost symmetric. A common guideline for far from 0 is for both to be at least 25. We will return to this topic soon. Below is a list of some other facts about binomial distributions.

  1. The probability histogram for a binomial always has exactly one peak. The peak can be one or two rectangles wide, but never wider.
  2. If np is an integer, then there is a one-rectangle wide peak located above np.
  3. If np is not an integer, then the peak will occur either at the integer immediately below or above np; or, in some cases, at both of these integers.
  4. If you move away from the peak in either direction, the heights of the rectangles become shorter. If the peak occurs at either 0 or n this fact is true in the one direction away from the

peak.

The following result is similar to Results 7.1–7.3 for the sum of ranks test in Chapter 7.

Result 11.1 (Mean and standard deviation of the binomial distribution.) The mean and stan- dard deviation of the Bin( n, p ) distribution are:

μ = np (11.2)

σ =

npq (11.3)

Let’s consider the Bin(100,0.50) distribution, pictured in Figure 11.1. From the above result, its mean and standard deviation are

μ = np = 100(0.50) = 50 and σ =

npq =

√ 100(0.50)(0.50) = 5.

Suppose now that I want to compute P (X ≥ 55). I have three methods for obtaining this proba- bility:

  1. Because n ≤ 1000 I can use the website

http://stattrek.com/Tables/Binomial.aspx

I go to the site, enter p = 0. 50 , n = 100 and x = 55; then I click on Compute and obtain the answer: P (X ≥ 55) = 0. 1841.

  1. Because n ≤ 1022 , I can use Minitab. I did and obtained the answer 0.1841.
  2. I can follow the method of Chapter 7 and obtain a Normal curve approximation. I go to the website:

http://davidmlane.com/hyperstat/z_table.html

I enter Mean = 50 and Sd = 5. Next to the option Above I enter 54.5—remember the continuity correction. The site tells me that the area under the N(50,5) curve to the right of 54.5 is 0.1841. To the nearest 0.0001, this approximation is exact!

Let me do another computational example. Consider the Bin(1200,0.60) distribution. I am interested in P (X ≤ 690). Again, I will try three methods for finding this probability.

  1. Because n ≤ 1387 , I can use Minitab to find the exact probability. I did and obtained the answer 0.0414.
  2. I calculate

μ = np = 1200(0.60) = 720 and σ =

npq =

√ 1200(0.60)(0.40) = 16. 971.

I go to the website:

1,001 and p = 0. 5 ; the site worked fine. Then I reentered p = 0. 001 x = 0 and n = 1,001; this time, the site gave me:

P (X = 0) = 0. 025 and P (X < 0) = 0. 217.

Because X counts successes, it cannot be negative! Thus, the kindest thing I can say is that for n > 1000 the site’s behavior is erratic; do not use it unless you check that both np and nq equal or exceed 25.

In summary, here are two general guidelines I recommend you use.

  1. For the purpose of computing probabilities: If n ≤ 1000 use the binomial website. If n > 1000 , np ≥ 25 and nq ≥ 25 : you may use the binomial website or obtain the Normal curve approximation by hand.
  2. For the development of estimation and prediction intervals: Use the Normal approxima- tion to the binomial only if both np and nq are greater than or equal to 25.

Let me make a few comments on these guidelines. First, all statisticians agree that we need to consider the values of np and nq; not all would agree on my magic threshold of 25. Second, if n > 1000 and, say, np < 25 we can use the Poisson distribution to approximate the binomial; this material will be presented in Chapter 13. Third, the second guideline is a bit odd. It implies, for example, that for n = 50 and p = q = 0. 50 we may use the Normal curve approximation even though exact probabilities are readily available from the website. As you will learn in Chapter 12, being able to use the Normal curve approximation is very helpful for the development of general formulas.

11.3 Calculating Binomial Probabilities When p is Unknown

I could make this a very short section by simply remarking that if p is unknown then obviously neither the website, Minitab nor I can evaluate Equation 11.1. In addition, if p is unknown then we can calculate neither the mean nor standard deviation of the binomial, both of which are needed for the Normal curve approximation. We do have this section, however, because I want to explore the idea of what if means to know the value of p. I am typing this in October, 2013. To date in his NBA (National Basketball Association) career, during the regular season Kobe Bryant has made 7,932 free throws out of 9,468 attempts. What can we do with these data in the context of what we have learned in this chapter? I am always interested in computing probabilities; thus, when faced with a new situation I ask myself whether it is reasonable to assume a structure that will allow me to do so. Well, each free throw attempted by Bryant can be viewed as a trial, so I might assume that his 9,468 attempts were observations of i.i.d. trials. As a side note, let me state that years ago I was active in the Statistics in Sports section of the American Statistical Association. We had many vigorous debates—and many papers have been written—on the issue of whether the assumption of i.i.d. trials is reasonable in

sports in general, not just for free throws in basketball. In order to avoid a digression that could consume months, if not years, let’s tentatively assume that we have i.i.d. trials for Bryant shooting free throws. The next issue is the value of p for Bryant’s trials. Bryant shooting a free throw was not as simplistic as, say, tossing a fair coin or casting a balanced die; nor was it as well-behaved as Mendelian inheritance. In short, there is no reason to believe that we know the value of p for Bryant or, indeed, any basketball player. But what do I mean by know? We know that p is strictly between 0 and 1. As any mathematician will tell you, there are a lot of numbers between 0 and 1! (The technical term is that there are an uncountable infinity of numbers between 0 and 1.) But we need to think more like a scientist and less like a mathematician. In particular, by scientist I mean someone who is—or strives to be—knowledgeable about basketball. A mathematician will (correctly) state that 0.7630948 and 0.7634392 are different numbers, but as a basketball fan, I don’t see any practical difference between p equaling one or the other of these. In either situation, I would round to three digits and say, “The player’s true ability, p, is that in the long-run he/she makes 76.3% of attempted free throws.” Bryant’s data give us p ˆ = 7932/9468 = 0.838;

in words, during his career, to date, Bryant has made 83.8% of his free throws. As might be clear— and if not, we will revisit the topic in the next chapter—we may calculate the nearly certain interval (Formula 4.1 in Chapter 4) for p:

√ 0 .838(0.162) 9468

= 0. 838 ± 0 .011 = [0. 827 , 0 .849].

Thus, given our assumption of i.i.d. trials, we don’t know Bryant’s p exactly, but every number in its nearly certain interval is quite close to his value of pˆ. Thus, if I wanted to compute a probability for Bryant, I would be willing to use p = 0. 838. Let’s consider the first n = 50 free throws that Bryant will attempt during the 2013–2014 NBA season. I am interested in the number of these free throws that he will make; call it X. Based on my discussion above I view X as having the Bin(50,0.838) distribution. For example, I go to the website

http://stattrek.com/Tables/Binomial.aspx,

and enter p = 0. 838 , n = 50 and x = 38. I click on Calculate and obtain

P (X ≥ 38) = 0. 9482.

Thus, I believe that the probability that Kobe Bryant will make at least 38 of his first 50 free throws this season is just under 95%. ( Optional enrichment for basketball fans. It can be shown that the probability of the event I examine above, (X ≥ 38) is an increasing function of p. I found that for p = 0. 838 , this probability is 0.9482. If I use the lower bound of my nearly certain interval, p = 0. 827 , the website gives me 0.9201 for the probability of this event. If I use the upper bound of my nearly certain interval,

Table 11.4: The 18 runs for the data in Table 11.3.

Run: 1 2 3 4 5 6 7 8 9 Year(s): 1–2 3–5 6 7–11 12 13–15 16–17 18 19– Length: 2 3 1 5 1 3 2 1 13 Type: S F S F S F S F S

Run: 10 11 12 13 14 15 16 17 18 Year(s): 32–33 34 35-36 37 38–41 42 43 44–46 47 Length: 2 1 2 1 4 1 1 3 1 Type: F S F S F S F S F

  • The length of the longest run of successes; and
  • The length of the longest run of failures.

In particular:

  • Does it appear that there is a constant probability of success from trial-to-trial?
  • Do the trials appear to be independent?

This topic is frustrating for a number of reasons. In part because it is so frustrating, this topic typically is not presented in an introductory Statistics class. I will, however, discuss these issues in this section briefly for the following reasons.

  1. I feel that I am doing you a disservice if I state assumptions without giving any idea of how to investigate their validity.
  2. I feel that I am doing you a disservice if I present you with a sanitized view of Statistics; a view in which there are no controversies and no confusion about how to analyze data.
  3. In my experience, this is one of the topics in Statistics that non-statisticians find very inter- esting. Especially the occurrence of long runs of successes (or failures) are interesting to people.

Let me now define some ideas rather formally. I plan to observe a total of n dichotomous trials. I want to investigate whether the dichotomous trials satisfy the assumptions of Bernoulli trials. I decide to pursue this by conducting a test of hypotheses. My null hypothesis is that the trials are Bernoulli trials. This is in line with the principle of Occam’s Razor. Also, usually a researcher wants to have Bernoulli trials because Bernoulli trials allow the computation of many answers. Of course, there are some researchers—and I often find myself in this camp—who sometimes are hoping not to have Bernoulli trials because sometimes it is nice to live in a world that is a bit more complicated.

Please allow me a brief digression. Many textbooks state that the alternative always represents what the researcher is trying to prove. Often times, this is a valid view of the hypotheses, but not always. For example, in this current section, I must assume that the null is Bernoulli trials because otherwise there is no way to find a sampling distribution and so on; it doesn’t really matter what I prefer to be true! Wait a minute. You might be thinking—even though we don’t yet have a test statistic:

We need to be able to determine the sampling distribution of the test statistic under the assumption that the null hypothesis is true. This is hopeless! The null hypothesis does not specify the value of p for the Bernoulli trials; thus, it will be impossible to calculate a unique set of probabilities!

If these are your thoughts, then you are correct. Well, almost correct. The trick is that we use a conditional test statistic. Let me explain. Let’s look at the Super Bowl example again. The total number of trials in the data set is n = 47. Before collecting data I didn’t know what the value of X, the total number of successes in the 47 trials, would be. After I collect the data I know that the observed value of X is x = 25. The trick is that I condition on X eventually being 25. Let me make the following points.

  1. Most (nearly all?) statisticians feel fine about this conditioning; here is why. Given that X = 25—and therefore that the number of failures, n − X = 47 − X is 22—what have I learned? I have learned that over the course of the data collection, neither conference had a huge advantage over the other. But , and this is the key point, knowing that X = 25 gives me no information about whether p is changing or whether the trials are independent. In other words, knowing that X = 25 gives me no information about whether the assumptions of Bernoulli trials are reasonable.
  2. This point is a little more esoteric. I have always extolled you to remember: probabili- ties are calculated before we collect data. Conditioning on X = 25 and then calculating probabilities—as I will soon do—looks a lot like I am violating my directive. But I am not. Actually, before collecting data I can imagine 48 different computations, one for each of the 48 possible values of X (0, 1, 2,... , 47). If I were to perform all of these computations, after I collect data I would find that my computations conditional on X = x would be irrelevant for all x 6 = 25. Why should I perform computations that I won’t use?

I will now explain why conditioning is so useful mathematically for our problem. Consider the Super Bowl data again. Conditional on knowing X = 25, we know that the data will consist of an arrangement of 25 1’s and 22 0’s. The number of such arrangements is:

47! 25!22!

= 1. 483 × 1013 ,

almost 15 trillion. It can be shown that, on the assumption that the null hypothesis is true, these arrangements are equally likely to occur, regardless of the value of p! Thus, if we choose our test statistic to be a function of the arrangement of 1’s and 0’s, then we can compute its sampling distribution without knowing the value of p.

ulation experiment for any of these test statistics; indeed, you might have noticed how similar the current situation is to selecting assignments for a CRD at random. (If you don’t see the connection, no worries.) There is a fancy math approximation for the sampling distribution of R and I will discuss it briefly in the following subsection.

11.4.1 The Runs Test

The null hypothesis is that the trials are Bernoulli trials. The test statistic is R. Pause for a moment. What is missing? Correct; I have not specified the alternative hypothesis. For the math level I want to maintain in this course, a careful presentation of the alternative is not possible. Instead, I will proceed by examples. But first, I want to give you the formulas for the mean and standard deviation of the sampling distribution of R.

Result 11.2 (The mean and standard deviation of the sampling distribution of R .) Conditional on the number of successes, x , and number of failures, n − x , in a sequence of n Bernoulli trials, the mean and standard deviation of the number of runs, R , are given by the equations below. First compute c = 2x(n − x); then (11.4)

μ = 1 +

c n

; and (11.5)

σ =

√√ √√ c(c − n) n^2 (n − 1)

Recall my artificial small example—with n = 5, x = 3 and n − x = 2. First, I calculate c = 2(3)(2) = 12. The mean and standard deviation are:

μ = 1 + 12/5 = 3. 4 and σ =

√√ √√ 12(7) 52 (4)

(If you look at the 10 possible arrangements in Table 11.5, and sum the corresponding 10 values of r, you will find that the sum is 34 and the mean is 34 /10 = 3. 4 , in agreement with the above use of Equation 11.5.) For my Super Bowl data, n = 47, x = 25 and n − x = 22. Thus, c = 2(25)(22) = 1100. The mean and standard deviation are:

μ = 1 + 1100/47 = 24. 404 and σ =

√√ √√ 1100(1053) 472 (46)

Note that the observed number of runs, 18, is smaller than the mean number under the null hypoth- esis. This will be relevant very soon.

I need to talk about the alternative, but first I need to present an artificial example. Imagine that I have n = 50 dichotomous trials with x = n − x = 25. From Result 11.2, c = 2(25)(25) = 1250. The mean and standard deviation are:

μ = 1 + 1250/50 = 26 and σ =

√√ √√ 1250(1200) 502 (49)

Based on my intuition, there are two arrangements that clearly provide very strong evidence against Bernoulli trials. (I know. We are not supposed to talk about evidence against the null; please bear with me.) The first is a perfect alternating arrangement:

10101 01010 10101 01010 10101 01010 10101 01010 10101 01010

The second is 25 successes followed by 25 failures:

11111 11111 11111 11111 11111 00000 00000 00000 00000 00000

For the first of these arrangements, R = 50 and V = W = 1. For the second arrangement, R = 2 and V = W = 25. It is easy to see that the first arrangement gives the largest possible value for R—after all, there cannot be more runs than trials! Similarly, the first arrangement gives the smallest possible value for both V and W. Also, the second arrangement gives the smallest possible value of R and the largest possible value for both V and W. If we consider all possible arrangements it makes sense that large [small] values of R tend to be matched with small [large] values of both V and W. The important consequence of this tendency is that when I do talk about alternatives, the < alternative for R will correspond to the > alternative for V or W. Note that everything I say about the first arrangement is also true for the arrangement:

01010 10101 01010 10101 01010 10101 01010 10101 01010 10101

Similarly, everything I say about the second arrangement is also true for the arrangement:

00000 00000 00000 00000 00000 11111 11111 11111 11111 11111

Let’s suppose that I have convinced you that both of these arrangements—or, if you prefer all four—provide convincing evidence that we do not have Bernoulli trials. I now look at the question: Which assumption of Bernoulli trials is being violated? Here are two possible interpretations of the data in the second arrangement which, recall, consists of 25 successes followed by 25 failures:

  1. There is almost—but not quite—perfect positive dependence. A success [failure] is almost always followed by a success [failure]. In fact, the only wrong prediction in the predict the response to remain the same paradigm occurs at trials 25 to 26.
  2. The value of p is 1 for the first 25 trials and is 0 for last 25 trials.

Thus, we don’t have Bernoulli trials, but is it because of dependence or because p changes? Note that I am talking about what we know from the data ; it’s possible that your knowledge of the science behind the study will lead you to discard one of my two explanations.

Table 11.7: The partial approximate sampling distribution of V , the longest run of successes, in 47 Bernoulli trials, conditional on a total of 25 successes.

v: 9 10 11 12 13 14 Rel. Freq. (V ≥ v) 0.0355 0.0134 0.0049 0.0017 0.0004 0.

Let me note that I have been unable to find a website that allows us to enter our data for a runs test. As always, please let me know if you find one. Minitab will perform a runs test. It uses the Normal curve approximation, but it has two curious features:

  1. Minitab does not use the continuity correction. Thus, its P-value for 6 = is 0.0578, substan- tially smaller than the answer—0.0804—obtained with the continuity correction.
  2. Minitab gives the P-value only for the alternative 6 =.

Regarding this second item; one could, of course, halve Minitab’s answer—0.0578—to obtain 0.0289 as the approximate P-value for the alternative supported by the data, in this case <. I suspect that Minitab’s creators are expressing the belief—which has merit—that when we check assumptions we should not be overly restrictive in our choice of alternative.

11.4.2 The Test Statistics V and W

The presentation in this section will be very brief. The null hypothesis is that the trials are Bernoulli trials. I will consider two possible test statistics: V [W ] the length of the longest run of successes [failures]. The only alternative I will explore is >, that the true model tends to give larger values of V [W ] than the (null) Bernoulli trials model. The P-value for the alternative > and the test statistic V is P (V ≥ v), (11.9)

where v is the observed value of V. The P-value for the alternative > and the test statistic W is

P (W ≥ w), (11.10)

where w is the observed value of W. I can obtain the exact sampling distributions of V and W only for very small values of n. I am unaware of the existence of any accurate fancy math approximation to either of these sampling distributions. Therefore, we will obtain approximations to these sampling distributions by using a computer simulation experiment. I performed a simulation experiment with 10,000 reps on the Super Bowl data; my approximate distribution for V is in Table 11.7. Recall that the observed value of V for the Super Bowl data is v = 13. Thus, the approximate P-value for the alternative > is 0.0004, a very small number. I am quite convinced that the Super Bowl data did not come from Bernoulli trials. The observed value of W is 5 for the Super Bowl data; the AFC’s longest run of victories occurred in years 7–11. I anticipated that the P-value for W would not be small; thus, I performed

Table 11.8: Katie’s data on day 7. She obtained a total of 68 successes and 32 failures. The observed values of the various test statistics are r = 42, v = 19 and w = 4.

SS F SS F S FFF SSSSS FF SSSSSSSS F S F SS F SS F SS FF SSSSS FF SS FF SS F SSS F S FFFF SSSSSSSSSSSSSSSSSSS FF SSS F SS FF S F SS F SS F S F

a simulation experiment with only 1,000 reps. The relative frequency of occurrence of W ≥ 5 was 0.422; thus, this is my approximate P-value. I end this section with a small piece of a large study. Katie Voigt was a starting shooting guard on the women’s basketball team at the University of Wisconsin–Madison. A few years later she was kind enough to collect a large amount of data and allow me to analyze it. For each of 20 days, after warming-up, Katie would attempt 100 shots from a favorite spot behind the three-point line. In the following example I will tell you about Katie’s data on day 7. The last Practice Problem will look at her data from day 3 and the last Homework Problem will look at her data from day 8.

Example 11.2 (Day 7 of Katie’s Study of Shooting.) On day 7, Katie made 68 of her 100 at- tempts. The data are presented in Table 11.8. I do not want you to verify any of the numbers in the caption of the table. Indeed, the table’s construction is not conducive to verifying calculations. Rather, I want you to look at the data and form an impression of it.

I will now find the approximate P-values for Katie’s day 7 data, for each of our three test statistics. I will use the alternative < for R for illustration because a small value of R tends to go with large values of V and W , as I discussed earlier. First, I need to use Result 11.2 to find the (null) mean and standard deviation of R. First, c = 2x(n − x) = 2(68)(32) = 4,352. The mean and the standard deviation are:

μ = 1 + c/n = 1 + (4352/100) = 44. 52 and σ =

√√ √√ c(c − n) n^2 (n − 1)

√√ √√ 4352(4252) 1002 (99)

For the alternative < the P-value equals P (R ≤ r) = P (R ≤ 42). I go to the website:

http://davidmlane.com/hyperstat/z_table.html

I enter Mean = 44. 52 and Sd = 4. 323. Next to the option Below I enter 42.5—remember the continuity correction. The site tells me that the approximate P-value is 0.3202. Thus, the runs test detects only weak evidence in support of the alternative. I performed two simulation experiments; with 10,000 reps for V and 1,000 reps for W. I obtained the following relative frequencies:

Rel. Freq. (V ≥ 19) = 0. 0074 and Rel. Freq. (W ≥ 4) = 0. 525

Thus, V is very sensitive to the evidence in the data, while W is not.