Hypothesis Testing: Assessing Claims about Population Parameters - Prof. Mary Kathryn Cowl, Study notes of Data Analysis & Statistical Methods

An introduction to hypothesis testing, a statistical method used to assess claims about population parameters based on sample data. The concept of statistical inference, the goals of statistical inference, and the role of hypothesis testing in making decisions about populations. It also covers the terminology of hypothesis tests, including null and alternative hypotheses, and the concept of p-values. Examples of one-sided and two-sided hypothesis tests, as well as the use of z and t statistics. It also discusses the concept of type i and type ii errors.

Typology: Study notes

Pre 2010

Uploaded on 03/11/2009

koofers-user-l7u
koofers-user-l7u šŸ‡ŗšŸ‡ø

5

(1)

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
22S:105
Statistical Methods and Computing
Introduction to Hypothesis Testing
Lecture 14
Mar. 10 and 14, 2008
Kate Cowles
374 SH, 335-0727
2
Introduction to Hypothesis Testing
Recall that statistical inference is using data
contained in a sample to draw conclusions or
make decisions about the entire population from
which the sample is taken.
Two main goals of statistical inference
•estimation of unknown population parame-
ters
•testing specific hypotheses about unknown
population parameters
The purpose of hypothesis testing is to ā€œassess
the evidence provided by data about some claim
concerning a population.ā€*
* Moore, D.S. The Basic Practice of Statistics
3
Example:
I claim that my husband’s resting pulse rate is
45 beats per minute. This is very low and would
be typical of either a highly trained athlete or a
sick individual.
To test my claim, you wish to measure his rest-
ing heart rate on 5 different occasions.
Here, the ā€œpopulationā€ of interest is all possi-
ble measurements of my husband’s resting pulse
rate. My claim may be interpreted as saying
that the mean µof this ā€œpopulationā€ of values
is 45 beats per minute.
4
Suppose the measurements you get are:
42 52 43 48 47
The sample mean ĀÆx= 46.4. Does this provide
evidence against my claim?
We will consider this question by asking what
would happen if my claim were true and we
repeated the sample of 5 measurements many
times.
pf3
pf4
pf5

Partial preview of the text

Download Hypothesis Testing: Assessing Claims about Population Parameters - Prof. Mary Kathryn Cowl and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

22S:

Statistical Methods and Computing

Introduction to Hypothesis Testing

Lecture 14 Mar. 10 and 14, 2008

Kate Cowles 374 SH, 335- [email protected]

Introduction to Hypothesis Testing

Recall that statistical inference is using data contained in a sample to draw conclusions or make decisions about the entire population from which the sample is taken.

Two main goals of statistical inference

  • estimation of unknown population parame- ters
  • testing specific hypotheses about unknown population parameters

The purpose of hypothesis testing is to ā€œassess the evidence provided by data about some claim concerning a population.ā€*

  • Moore, D.S. The Basic Practice of Statistics

3

Example:

I claim that my husband’s resting pulse rate is 45 beats per minute. This is very low and would be typical of either a highly trained athlete or a sick individual.

To test my claim, you wish to measure his rest- ing heart rate on 5 different occasions.

Here, the ā€œpopulationā€ of interest is all possi- ble measurements of my husband’s resting pulse rate. My claim may be interpreted as saying that the mean μ of this ā€œpopulationā€ of values is 45 beats per minute.

4 Suppose the measurements you get are:

42 52 43 48 47

The sample mean ĀÆx = 46.4. Does this provide evidence against my claim?

We will consider this question by asking what would happen if my claim were true and we repeated the sample of 5 measurements many times.

Suppose first that we knew that the standard deviation of measurements of my husband’s rest- ing heart rate was σ = 4 beats per minute.

  • If the claim that μ = 45 is true, the sam- pling distribution of ĀÆx from 5 measurements is normal with mean μ = 45 and standard deviation σ √ n

√^4

  • We can judge whether any observed ĀÆx is sur- prising by finding it on this distribution.

Terminology of hypothesis tests

The null hypothesis is the statement being tested.

  • The test is intended to assess the strength of evidence against the null hypothesis.
  • Usually is a statement of ā€œno effect,ā€ ā€œno dif- ference,ā€ ā€œnothing going on.ā€
  • The null hypothesis is commonly symbolized as H 0.
  • H 0 is a statement about an unknown popu- lation parameter.
  • Example: H 0 : μ = 45

7

The alternative hypothesis is the claim for which we are trying to find evidence.

  • symbolized Ha

In the example about my husband’s heart rate, your alternative hypothesis probably was

Ha : μ > 45

The p-value of the test is the probability, com- puted assuming that H 0 is true, that the ob- served outcome would take a value as extreme as or more extreme than, what we actually ob- served.

  • Small p-values are evidence against the null hypothesis.

8 The result of a hypothesis test is a decision. The possible outcomes are called

  • Rejecting the null hypothesis
  • Not rejecting the null hypothesis

Before we carry out the test, we must decide how strong we will require the evidence to be in order for us to reject H 0. We specify this in terms of a significance level.

  • The significance level is how small we will require the p-value to be in order to reject H 0.
  • symbol is α
  • conventional choices are α = .05 and α =. 01

Two-sided hypothesis tests

Example: We wish to compare fasting serum cholesterol levels in persons over 21 living in a group of islands in the South Pacific with typical levels found in the U.S.

We know that levels in adults over 21 in the US are approximately normally distributed with

  • mean 190 mg/dl
  • standard deviation 40 mg/dl.

We have no idea what the relative levels of serum cholesterol are on the islands as compared with the U.S.

We will assume that the levels on the islands are normally distributed with

  • unknown mean μ
  • known standard deviation 40 mg/dl

15

The hypotheses for our two-sided test are:

H 0 : μ = 190 Ha : μ 6 = 190

Before we look at our data, we will decide on the significance level α for our test. Let us choose α = .05.

We then perform blood tests on 100 adults from the islands and find that the sample mean level xĀÆ = 181.5 mg/dl.

To carry out our hypothesis test, we note that, if H 0 is true, the sampling distribution of ĀÆx is normal with

μ = 190 σ¯x =

√^40

16 We will standardize the value of ĀÆx that we ob- served to find out how likely we would have been to get a value as extreme as what we got, or more extreme, if H 0 were true.

z =

xĀÆ āˆ’ μ 0 σ/

n

=

We must find out what area under the standard normal curve lies

  • to the left of -2.
  • and to the right of 2.

The answer is .017 + .017 = .034.

This is the p āˆ’ value for the test. Since p < .05 we reject the null hypothesis and conclude that serum cholesterol levels are different among adult residents of the Pacific Islands than among adults in the U.S.

One sample t-tests If we don’t know the population standard devi- ation, then we

  • estimate it with the sample standard devia- tion s
  • compute a t statistic rather than a z statistic
  • compare to a t distribution with the appro- priate degrees of freedom

Example: If we do not assume that we know σ for serum cholesterol levels among residents of the Pacific Islands.

From the sample of 100 adults, we compute s = 38. 1 mg/dl

We then compute t =

xĀÆ āˆ’ μ 0 s/

n

19

20 We try to use Table C to find the area to the left of -2.231 and to the right of 2.231 under a t curve with 99 degrees of freedom.

The closest we can come is that under a t curve with 100 degrees of freedom, the area in one tail would be between .01 and .02.

Thus we conclude that the p-valueis somewhere between .02 and .04.

SAS can do a much better job for us! It would provide a p-value of .0279.

Thus, if we had chosen α = .05, we would reject the null hypothesis.