Annotations for Statistics 515: Clarifications and Guidance on Complex Concepts, Study notes of Data Analysis & Statistical Methods

This document, authored by brian habing for the university of south carolina's stat 515 course, provides annotations and clarifications for various complex concepts presented in the textbook. The annotations cover topics such as permutations, combinations, continuity correction, and the normal approximation to the binomial distribution. The document also offers guidance on which methods to use for specific statistical analyses and when to apply certain rules.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-lz9-1
koofers-user-lz9-1 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 515 - Annotations to the Text
Brian Habing - University of South Carolina
Last Updated: December 31, 2003
In several places the textbook uses some shorthand notation that might seem confusing on first
glance. There are also a few places where it presents things in a more complicated fashion than
it needs to. The notes below are given by section number and will hopefully make reading the
text a bit easier once in a while. There are also seven supplements available on the course web-
page that contain additional material. Chapters and sections with supplements are noted below
in the appropriate place.
Section 3.8: Notice that we can think of the permutations rule (page 156) and the combinations
rule (page 158) as both being special cases of the partitions rule (page 157). A permutation is a
partition where there are n groups of size 1 and one group of size N-n. This is the case where
each of the first n selected are selected for different purposes and the order is important, but we
don’t care about the order of those not selected. A combination is a partition where one group is
of size n and the other is of size N-n. This is the case where the first n are interchangeable with
each other and the remaining N-n are also interchangeable.
Section 4.4: q=1-p is just shorthand. Don’t let the q throw you off.
Section 5.4: We will always use method 4 in the box on page 237, the normal probability plot
(also called the q-q plot). Exercise 5.45 on page 241 gives three sample q-q plots (with answers
in the back of the text).
Histograms (method 1) are bad because they are easily manipulated by choosing different class
intervals. Figuring out the percentages (method 2) is time consuming. Comparing the IQR to s
(method 3) isn’t built into most packages and doesn’t give as much information as the q-q plot.
Section 5.5: The explanations in this section are a lot more complicated than they need to be.
The entire key to the section can be seen by simply reading the material on page 244 and at the
top of page 245.
First, notice in Figure 5.18 that they are looking for the probability that the binomial random
variable x will be less than or equal to 10. In order to get all of the histogram bars for 10 and less
they need to take everything from 10.5 and smaller. If you started at 10 you would miss half of
the 10 bar. If you started at 11 you would include an extra half of the 10 bar. If they had asked
pf3
pf4

Partial preview of the text

Download Annotations for Statistics 515: Clarifications and Guidance on Complex Concepts and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

STAT 515 - Annotations to the Text

Brian Habing - University of South Carolina Last Updated: December 31, 2003

In several places the textbook uses some shorthand notation that might seem confusing on first glance. There are also a few places where it presents things in a more complicated fashion than it needs to. The notes below are given by section number and will hopefully make reading the text a bit easier once in a while. There are also seven supplements available on the course web- page that contain additional material. Chapters and sections with supplements are noted below in the appropriate place.

Section 3.8: Notice that we can think of the permutations rule (page 156) and the combinations rule (page 158) as both being special cases of the partitions rule (page 157). A permutation is a partition where there are n groups of size 1 and one group of size N - n. This is the case where each of the first n selected are selected for different purposes and the order is important, but we don’t care about the order of those not selected. A combination is a partition where one group is of size n and the other is of size N - n. This is the case where the first n are interchangeable with each other and the remaining N - n are also interchangeable.

Section 4.4: q =1- p is just shorthand. Don’t let the q throw you off.

Section 5.4: We will always use method 4 in the box on page 237, the normal probability plot (also called the q-q plot). Exercise 5.45 on page 241 gives three sample q-q plots (with answers in the back of the text).

Histograms (method 1) are bad because they are easily manipulated by choosing different class intervals. Figuring out the percentages (method 2) is time consuming. Comparing the IQR to s (method 3) isn’t built into most packages and doesn’t give as much information as the q-q plot.

Section 5.5: The explanations in this section are a lot more complicated than they need to be. The entire key to the section can be seen by simply reading the material on page 244 and at the top of page 245.

First, notice in Figure 5.18 that they are looking for the probability that the binomial random variable x will be less than or equal to 10. In order to get all of the histogram bars for 10 and less they need to take everything from 10.5 and smaller. If you started at 10 you would miss half of the 10 bar. If you started at 11 you would include an extra half of the 10 bar. If they had asked

“< 10” we would have needed to start at 9.5, if they had asked “>10” we would have needed to start at 10.5, and if they had asked “≥10” we would have needed to start at 9.5. This going up or down by 0.5 is the continuity correction , and the safest way to see which way you need to go is to draw the picture.

Second, notice that they are using μ =μ x = np and σ =σ x = np ( 1 − p ) because we are

looking at the binomial distribution (box on the bottom of page 194). Third, an easier rule for seeing if n is large enoughis to say that n is large enough for the normal approximation to work if both np ≥5 and n (1- p ) ≥ 5. Technically this condition is weaker than

the one using μ±3 σ, but it seems to work fairly well in practice and will match a rule we will use

in Chapter 13. The accuracy of the normal approximation to the binomial, even for large n , is a topic of current research by statisticians, and we will see in section 7.3 that some tricks can be done to make it work even better in certain circumstances.

Applying this to example 5.12 we would do the following:

  1. np =200(0.06)=12≥5 and n(1-p)=200(1-0.06)=200(0.94)=188≥5 so the sample size is large enough for the normal approximation to the binomial to be reasonable. Notice that we’ve

already found μ= np =12. Just using the formula for s gives us

σ= np ( 1 − p )= 200 ( 0. 06 )( 1 − 0. 06 )= 11. 28 ≈ 3. 359

  1. If we want P ( x ≥20) then we want to include the 20 bar, so we need to start at 19.5. (The uncolored in area in Figure 5.20... there is no reason to switch it around like the book does).

( 20 ) ( 19. 5 )^19.^5 ≈ ≥

≥ = ≥ = ^ − ≥ − Pz np p

P x P x P x P x np

This is now just a probability to look up on the normal table, and we get 0.5-0.4871=0.0129.

Section 6.3: When reading the examples in this section it is important to note the blue box on the

bottom of page 274 that says thatμ x = μand that σ x = σ n.

See the supplement: Chapter 6 - More on Sampling Distributions: t, chi-square and F

Section 7.1: As in section 6.3, note that μ x = μ and that σ x = σ n.

Section 8.5: In the boxes on page 309 notice first that n

p p n

pq p

ˆ

σ = =^ − and, unlike for

confidence intervals, we actually have p. Second, notice that we can again use the np ≥5 and

n (1- p ) ≥ 5 rule discussed above instead of the p ˆ ± 3 σ p ˆ rule.

See the supplement: Section 8.6 - Power Curves

Section 9.1: Just like in sections 7.1 and 8.2 we will never use the large sample formulas with z

and σ discussed on pages 380-385. Instead use the t formula on page 386 if the variances are

equal. If the variances are not equal use the box on page 390.

Section 9.2: Again, do not use the large sample formula in the box on page 401.

Section 9.3: Note that 2

2 2 1

1 1 2

2 2 1

)^11 (ˆ 1 ˆ 2

n

p p n

p p n

pq n

pq p p

= + = − +^ −

When we are making a confidence interval we know nothing about the values of p 1 and p 2 except

what we have in the sample. In this case we just substitute in p ˆ 1 and p ˆ 2. When we are making a

test of the hypothesis that the two populations have equal percentages, however, it doesn’t make sense to put in two different values (because we are assuming they are the same!). In this case

we substitute in 1 2

n n

p x x

= + for both p 1 and^ p 2.

See the supplements: Section 10.2 - The ANOVA Table Section 11.3 - Checking the Regression Assumptions Section 11.5 - The ANOVA Table for Regression Section 13.3 - Chi-Square Test for Homogeneity