Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistical Inference: Hypothesis Testing and Confidence Intervals - Prof. Kobi Abayomi, Study notes of Data Analysis & Statistical Methods

Georgia Institute of Technology - Main Campus Data Analysis & Statistical Methods

Prof. Kobi Abayomi

Various methods for statistical inference, including hypothesis testing and confidence intervals. Topics include testing for population means, the chi-square test for goodness of fit, the t-distribution, and tests for differences in proportions and variances. The document also includes examples and calculations using r.

Typology: Study notes

Pre 2010

Uploaded on 08/04/2009

koofers-user-7q0-2 🇺🇸

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

ISYE 2028 A and B

Lecture 12

Confidence Intervals and Hypothesis Testing ’cont.

Dr. Kobi Abayomi

March 25, 2009

We have looked at hypothesis testing generally, but we have used only the specific example

of a test for the population mean. For instance, if X∼µ, σ2is a random variable [model],

and we collect some data x=Pn

ixi. Then, the hypotheses

H0:µ=µ0vs. Ha:µ6=µ0

we use in a two sided test of the population mean. You will recall that we use the sampling

distribution x∼N(µ, σ2/n) to construct the test statistic:

Z=x−µ0

pσ2/n

which has the standard normal distribution N(0,1).

This setup is often sufficient: the Zstatistic is the deviation of the data from the null

hypothesis, over its standard deviation. In words:

Z≡obs −exp

S.D(obs)

is the statistic we want to use if we want to test the proportion of people who vote for

Pedro, the mean income of Njoroge’s in Kisumu, if the sample mean is representative of our

population mean.

Discover Study notes of Data Analysis & Statistical Methods Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Statistical Inference: Hypothesis Testing and Confidence Intervals - Prof. Kobi Abayomi and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Lecture 12

Confidence Intervals and Hypothesis Testing ’cont.

Dr. Kobi Abayomi

March 25, 2009

We have looked at hypothesis testing generally, but we have used only the specific example of a test for the population mean. For instance, if X ∼ μ, σ^2 is a random variable [model], and we collect some data x =

∑n i xi. Then, the hypotheses

H 0 : μ = μ 0 vs. Ha : μ 6 = μ 0

we use in a two sided test of the population mean. You will recall that we use the sampling distribution x ∼ N (μ, σ^2 /n) to construct the test statistic:

Z =

x − μ 0 √ σ^2 /n

which has the standard normal distribution N (0, 1).

This setup is often sufficient: the Z statistic is the deviation of the data from the null hypothesis, over its standard deviation. In words:

Z ≡

obs − exp S.D(obs)

is the statistic we want to use if we want to test the proportion of people who vote for Pedro, the mean income of Njoroge’s in Kisumu, if the sample mean is representative of our population mean.

Situations often arise where the sample mean cannot sufficiently describe, or test for, im- portant hypothetical differences in populations. We must appeal to other distributions, to other quantifications of difference, to test other hypothesis. A useful alternative is...

1 The Chi Squared Distribution and associated Hy-

potheses Tests

Recall this example from Lecture 10: Say we are interested in the fairness of a die. Here is the observed distribution after 120 tosses:

Die Face 1 2 3 4 5 6 Obs. Count 30 17 15 23 24 21

The appropriate test statistic here is the Chi-square.

1.1 The Chi-Square test for Goodness of Fit

Formally, here, we are going to test

H 0 : T he die is f air

vs.

Ha : T he die is not f air

In general the hypotheses tests are

H 0 : πi =

ni n

, f or all i

vs.

Ha : πi 6 =

ni n

, f or at least 1 i

Remember here, our observed test statistic is χ^2 o = (25−20)

2 20 +^ · · ·^ +^

(16−20)^2 20 = 18.00.^ The number of degrees of freedom n − 1, here 6 − 1 = 5. Notice that the total number of

We often have not enough samples to apply the central limit theorem to the sampling dis- tribution. In these situation we construct the t-statistic as well.

Formally, the two sided hypothesis test is still one of location of the true mean

H 0 : μ = μ 0 , σ^2 unknown

vs.

Ha : μ 6 = μ 0 , σ^2 unknown

A confidence interval here is:

x ± tα,df ∗

s^2 n

the associated margin of error:

M E = tα/ 2 ,df

s^2 n

the appropriate number of samples for a fixed 1 − α confidence level

n =

t^2 α/ 2 ,df s^2 M E^2

For the hypothesis testing setup of H 0 : μ = μ 0 vs. Ha : μ 6 = μ 0 our observed test statistic is

to =

x − μo s/

2 Samples - Independent or Dependent? 1 or 2 or

many?

In general always remember: (1) The sampling distribution, which will yield the (2) The confidence interval , which is immediately analogous to (3) The test statistic. Everything is a variation on this theme, just a slightly different scenario.

2.1 Scenario 1: Two sample proportions

Say we wish to gain inference on the support for election reform in California and Georgia. Let p 1 ≡ the proportion who support in Georgia and p 2 ≡ the proportion who support in California. We estimate these, in the usual way, ˆp 1 = x n^11 , pˆ 2 = x n^22 : the sample proportions of voters who supported the reform over total voters, for each state.

We know from the sampling distribution of ˆp: E(ˆp 1 ) = p 1 , E(ˆp 2 ) = p 2 and V ar(ˆp 1 ) = p 1 q 1 n 1 , V ar(ˆp^2 ) =^

p 2 q 2 n 2.

The difference p 1 − p 2 is distributed:

p 1 − p 2 ∼ N (p 1 − p 2 ,

p 1 q 1 n 1

p 2 q 2 n 2

This is the sampling distribution for the difference in proportions. The appropriate rescaled statistic is:

Z =

pˆ 1 − pˆ 2 − (p 1 − p 2 ) S.D.(ˆp 1 − pˆ 2 )

and it will have a standard normal distribution.

Thus, a confidence interval for the difference in two proportions is:

pˆ 1 − pˆ 2 ± Zα/ 2

pˆ 1 qˆ 1 n 1

pˆ 2 qˆ 2 n 2

For the two tailed hypothesis test

H 0 : p 1 = p 2 vs. p 1 6 = p 2

we exploit the fact p 1 = p 2 implies p 1 − p 2 = 0 and write

pˆpooled = ˆpp =

x 1 + x 2 n 1 + n 2

to pool the estimate of the population proportion, since, under the null, here, p 1 = p 2.

Then our test statistic is

t 0 =

x 1 − x 2 − ∆ 0 √ Sp^2 ( (^) n^11 + (^) n^12 )

2.3 Scenario 3: Two samples, ”dependent”

In many cases it is not reasonable to assume that your two samples have arrived indepen- dently. We call data paired when it is natural to think of each sample as bivariate. Like errors while playing piano with the right hand versus the left hand. In these cases, we believe that the samples come from one element, perhaps, but two separate samplings.

Let

D = X 1 − X 2

thus

di = xi 1 − xi 2

and

D =

(X 11 − X 12 ) + · · · + (Xn 1 − Xn 2 ) n

Here we have taken the differences in each observation, and then computed the average difference. A sampling distribution for D is

D ∼ μ 1 − μ 2 , SD^2 /n

where

SD^2 =

n − 1

∑^ n

(di − d)

We again approximate with the t-distribution. Here the degrees of freedom are the number of pairs minus 1 df = n − 1

The confidence interval for paired differences of the population mean is then:

d ± tα/ 2 ,n− 1 ∗

sd^2 n

And the hypotheses test for paired differences of the population mean, also known as a paired t-test is

H 0 : ∆ = ∆ 0 vs. Ha : ∆ 6 = ∆ 0

uses this test statistic

to =

d − ∆ 0 Sd/

3 Beyond the sample mean S^2

Thus far all of our confidence intervals and hypothesis test have been restricted to tests of the mean (test of location) μ. We have used the sample mean x as the natural estimator. Now we introduce tests and intervals based upon the variance (test of scale).

3.1 A confidence interval for σ^2

We have to accept as fact^3 that for a random sample of size n from a normal distribution with parameters μ, σ^2 that

(n − 1)S^2 σ^2

∼ χ^2 (n − 1) (1)

i.e. chi-squared with n − 1 degrees of freedom. We use this fact to set up a confidence interval, now, for σ^2 - using the estimator S^2.

Since P(χ^21 −α/ 2 ,n− 1 < (n−1)S

2 σ^2 < χ

2 α/ 2 ,n− 1 ) = 1^ −^ α^ then a 1^ −^ α^ percent confidence interval is (for α fixed)^4 :

(^3) The proof involves techniques not introduced in this class - but look at lectures 7-9 and you’ll get the flavor. (^4) Notice that χ (^21) −α/ 2 ,n− 1 6 = −χ (^2) α/ 2 ,n− 1. The chi-squared distribution is not symmetric, nor is the associated confidence interval

s^22 F 1 −α/ 2 ,ν 1 ,ν 2 s^21

s^22 Fα/ 2 ,ν 1 ,ν 2 s^21

as a 1 − α percent confidence interval for the ratio σ^22 /σ 12.

4 R example continued from lecture 11

4.1 part b

Here we are comparing costs of accidents in the non-ABS year 1991 and the ABS year 1992. We can treat the cost as a continuous non-proportion variable. Remember the data from lecture 4.

A hypothesis test:

H 0 : ∆μ = μN oABS − μABS = 0

vs.

H 1 : ∆μ = μN oABS − μABS > 0

The variances are unknown - so we know we need to use a t-test. But can we assume they are equal and use a pooled variance estimator?

First things first: this data has missing values:

mean(data)

#Cost1991 Cost1992 2074.952 NA

#we could also use data[37:42,]

mean(data,na.rm=T)

#Cost1991 Cost1992 2074.952 1714.

#here we have removed the missing values

var(data,na.rm=T)

#Cost1991 Cost1992 Cost1991 441529. #-7008.193 Cost1992 -7008.193 390409.

#this is the covariance matrix, for now we only need the diagonal #elements

We should do an F-test for equality of variances (I’ll skip the hypothesis notation for this intermediate test) to know which form of the t-test to apply.

> data[1,1]/data[2,2] [1] 1. #our observed value of the f-statistic

pf(data[1,1]/data[2,2],41,37,lower.tail=FALSE) [1] 0.

#the p-value for our (inherently) two tailed test

We can assume that the variances are equal, so our test statistic is:

t =

(xN oABS − xABS ) − 0 √ s^2 p(n− N oABS^1 + n− ABS^1 )

where

s^2 p =

(n 1 − 1)s^21 + (n 2 − 1)s^22 n 1 + n 2 − 2

The calculations in R:

> mean(data[,1],na.rm=T)-mean(data[,2],na.rm=T) [1] 360. %the difference in the sample means > s1squared<-var(data,na.rm=T)[1,1] > s2squared<-var(data,na.rm=T)[2,2] > spsquared<-((42-1)s1squared +(38-1)s2squared)/(42+38-2) > spsquared

Statistical Inference: Hypothesis Testing and Confidence Intervals - Prof. Kobi Abayomi, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Statistical Inference: Hypothesis Testing and Confidence Intervals - Prof. Kobi Abayomi and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ISYE 2028 A and B

Lecture 12

Confidence Intervals and Hypothesis Testing ’cont.

Dr. Kobi Abayomi

March 25, 2009

Z =

Z ≡

1.1 The Chi-Square test for Goodness of Fit

2.1 Scenario 1: Two sample proportions

Z =

2.3 Scenario 3: Two samples, ”dependent”

D = X 1 − X 2

D =

SD^2 =

3.1 A confidence interval for σ^2

4.1 part b

var(data,na.rm=T)