Confidence Intervals: Estimating Parameters with Margin of Error, Lecture notes of Medicine

Formulas and examples for calculating confidence intervals for population means and proportions. Confidence intervals are used to estimate parameters with a specified level of confidence. formulas for calculating confidence intervals for population means when the standard deviation is known and unknown, as well as for population proportions. It also includes examples of how to use a calculator to find confidence intervals. useful for students in statistics, research, or data analysis courses.

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

dirk88
dirk88 🇧🇪

4.4

(222)

3.1K documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
x t s
n
±
*
$
px
n
=
$
*
$
(
$
)
p z p p
n
±
1
Confidence Intervals
A confidence interval is an interval whose purpose is to estimate a parameter (a number that could, in
theory, be calculated from the population, if measurements were available for the whole population).
A confidence interval has three elements. First there is the interval itself, something like (123, 456).
Second is the confidence level, something like 95%. Third there is the parameter being estimated,
something like the population mean, µ or the population proportion, p. In order to have a meaningful
statement, you need all three elements: (123, 456) is a 95% confidence interval for µ .
Formulas:
General formula for confidence intervals: estimate ± margin of error
z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence
CI for a population mean (
σ
is known and n > 30
or the variable is normally distributed in the
population)
x z n
±
*
σ
(TI-83: STAT TESTS 7:ZInterval)
CI for a population mean (
σ
is unknown and n > 30
or the variable is normally distributed in the
population)
(TI-83: STAT TESTS 8:TInterval)
CI for a Population proportion
(when
np and n p
$
(
$
)
10 1 10
)
(TI-83: STAT TESTS A:1-PropZInterval)
If you don’t know
$
p
, use
$
p
=
1
2
(conservative approach) .
Minimum required sample size for a desired margin of error and confidence level:
When it is a mean problem:
nz
m
=
*
σ
2
When it is a proportion problem:
nz
mp p
=
*
$
(
$
)
2
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Confidence Intervals: Estimating Parameters with Margin of Error and more Lecture notes Medicine in PDF only on Docsity!

x t s n

p $ x n

p z p p n

Confidence Intervals

A confidence interval is an interval whose purpose is to estimate a parameter (a number that could, in

theory, be calculated from the population, if measurements were available for the whole population).

A confidence interval has three elements. First there is the interval itself, something like (123, 456). Second is the confidence level, something like 95%. Third there is the parameter being estimated, something like the population mean, μ or the population proportion, p. In order to have a meaningful statement, you need all three elements: (123, 456) is a 95% confidence interval for μ.

Formulas: General formula for confidence intervals: estimate ± margin of error

z* is 1.645 for 90% confidence, 1.96 for 95% confidence, and 2.576 for 99% confidence

CI for a population mean ( σ is known and n > 30 or the variable is normally distributed in the

population) x z n

(TI-83: STAT  TESTS  7:ZInterval)

CI for a population mean (^ σ^ is unknown and n > 30 or the variable is normally distributed in the

population)

(TI-83: STAT  TESTS  8:TInterval)

CI for a Population proportion (when (^) np $ (^) ≥ 10 and n ( 1 − p $)≥ 10 )

(TI-83: STAT  TESTS  A:1-PropZInterval)

If you don’t know p $ , use p $ =

(conservative approach).

Minimum required sample size for a desired margin of error and confidence level:

When it is a mean problem: n z m

When it is a proportion problem: n z m

= p p

$( $)

2 1

Examples:

1. You wish to estimate, with 95% confidence, the proportion of computers that need repairs or have problems by the time the product is three years old. Your estimate must be accurate within 3% of the true proportion. a. If no preliminary estimate is available, find the minimum sample size required.

If no preliminary estimate is available, use the conservative choice: p $ = 0 5. m = 3% = 0.

n z m

= p p

 −^ =^

 ⋅^ −^ =

$( $) .

(^2 ) 1

Thus we need at least 1112 computers to sample. (Remember: ALWAYS round up!)

b. Now suppose a prior study involving less than 100 computers found that 19% of these computers needed repairs or had problems by the time the product was three years old. Find the minimum sample size needed.

Now p $ = 019.

n

z m

= p p

 −^ =^

 ⋅^ −^ =

$( $) .

(^2 ) 1

This is a whole number, thus the minimum sample size we need is 684.

2. A college administrator would like to determine how much time students spend on homework assignments during a typical week. A questionnaire is sent to a sample of n = 100 students and their response indicates a mean of 7.4 hours per week and standard deviation of 3hours.

(a) What is the point estimate of the mean amount of homework for the entire student population (i.e., what is the point estimate for μ, the unknown population mean)?

The point estimate for the population mean is the sample mean. In this case it’s 7.4 hours.

(b) Now make an interval estimate of the population mean so that you are 95% confident that the “true” mean is in your interval (i.e., compute the 95% confidence interval).

Conditions: random sample? We don’t really know. n > 30, so we can assume by the CLT that the shape of the sampling distribution of the sample means is approximately normal.

x = 7.4 hours, and s = 3 hours. The population s.d. is unknown, we only know the sample s.d., so we need to use the t-interval.

Using x t

s n

± ^ (t = 2.045) or the calculator: 8: TInterval

The 95% t-interval is (13.98, 17.44).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in Roosevelt National Forest is between 13.98 inches and 17.44 inches. That is, based on this sample. If we could measure the circumference of ALL of the live aspen trees there, then we are 95% confident that the mean of all the measurements would be between 13.98 inches and 17. inches. Also, it means that if we would take many, many samples of size 30 of live aspen trees and calculate a 95% confidence interval for each sample, about 95% of them would contain the real, actual mean circumference and about 5% would miss it. But, of course, we don’t know which 5% would miss it.

The next sample had 100 trees with a mean of 15.58 inches. Again find a 95% confidence interval for the mean circumference of aspen trees from these data. Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can use the t-interval. x = 15.71 s = 4.63 n = 100

Using x t

s n

± ^ (t = 1.984) or the calculator: 8: TInterval

The 95% t-interval is (14.79, 16.63).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in Roosevelt National Forest is between 14.79 inches and 16.63 inches. That is, based on this sample, if we could measure the circumference of ALL the live aspen trees there, then we are 95% confident that the mean of all the measurements would be between 14.79 inches and 16. inches.

The last sample had 300 trees with a mean of 15.59 inches. Find a 95% confidence interval from these data. Conditions: σ is unknown, and n > 30 and the circumferences are normally distributed, so we can use the t-interval. x = 15.71 s = 4.63 n = 300

Using x t

s n

± ^ (t = 1.96) or the calculator: 8: TInterval

The 95% t-interval is (15.18, 16.24).

This means, that we are 95% confident that the mean circumference of ALL live aspen trees in Roosevelt National Forest is between 15.18 inches and 16.24 inches. That is, based on this sample, if we could measure the circumference of ALL the live aspen trees there, then we are 95% confident that the mean of all the measurements would be between 15.18 inches and 16. inches.

Find the length of each interval of parts (a), (b) and (c). Comment on how these lengths change as the sample size increases.

The length of the CI with n = 30 is 17.44 – 13.98 = 3. The length of the CI with n = 100 is 16.63 – 14.79 = 1. The length of the CI with n = 300 is 16.24 – 15.18 = 1.06.

The length of the interval gets smaller as the sample size increases.

4. In an article exploring blood serum levels of vitamins and lung cancer risks ( The New England Journal of Medicine ), the mean serum level of vitamin E in the control group was 11.9 mg/liter. There were 196 patients in the control group. (These patients were free of all cancer, except possible skin cancer, in the subsequent 8 years). Assume that the standard deviation σ = 4. mg/liter.

a. Find a 95% confidence interval for the mean serum level of vitamin E in all persons similar to the control group.

Conditions: Random sample? We don’t really know, but let’s assume they picked the subjects randomly. σ is known, so we can use the z-interval. x = 11.9 σ = 4.30 n = 196

Using either x z n

(z* = 1.96) or the calculator: 7: ZInterval

The 95% t-interval is (11.3, 12.5).

This means, that we are 95% confident that the mean serum level of vitamin E in the ALL cancer free patients is between 11.3 mg/liter and 12.5 mg/liter. That is, based on this sample, if we could measure the mean serum level of vitamin E in ALL cancer free patients (except possible skin cancer in the subsequent 8 years), then we are 95% confident that the mean of all the measurements would be between 11.3 mg/liter and 12.5 mg/liter.

b. If you wanted to estimate the mean serum level of vitamin E, with 90% confidence, and a margin of error of no more than 0.25 mg/liter, how large a sample would you need?

For the minimum sample size we need we can use the formula: n z m

Checking conditions for CI: random sample, np $^ = 560 > 10 and n ( 1 − p $) = 1000 1( − 0 56. ) = 440 > 10

Conditions are satisfied. We use : $ *

p z p p n

Thus, using the formula above (with z* = 1.96), or using the A:1-PropZInt menu on the calculator, we get (0.529, 0.591).

That is, based on the results from our sample of size 1000, we are 95% confident that the proportion of ALL voters who favored Propostion 1 is between 52.9% and 59.1%.

Notice that the sample size of 1000 gives a much narrower confidence interval than the sample size of 100. In fact, with the larger sample, we can be quite confident (about 95% of the time anyway), that a majority of the voters favored Proposition 1, since the smaller endpoint of the samples 95% confidence interval, 0.529 is greater than one-half. Bear in mind, however, that the larger sample may be more costly and time consuming than the smaller one. Now, how confident are you that Proposition 1 passed or failed?

I’d bet a small amount of money that I am right.

c. Forget the previous parts now. Assume that you didn’t take any samples yet. What sample size you need to use if you want the margin of error to be at most 3% with 95% confidence but you have no estimate of p?

Because you don’t have an estimate of p, use p $ = 0.5. We want the margin of error to be at most 3%, that is m = 0.03.

n

z m

= p p

 −^ =^

 −^ =

$( $)

(^2 ) 1

Thus, to get a margin of error to be at most 3%, we need at least 1068voters in our sample.

d. Now let’s assume you did a pilot sample, in which 56 out of 100 voters said they favor Proposition 1. What sample size you need to use if you want the margin of error to be at most 3% with 95% confidence now?

Now we have an estimate of p from the pilot study, so we use p $ = 0.56. We want the margin of error to be at most 3%, that is m = 0.03.

n z m

= p p

 −^ =^

 −^ =

$( $). .

(^2 ) 1

Thus, to get a margin of error to be at most 3%, we need at least 1052 voters in our sample.

  1. Sometimes a 95% confidence interval is not enough. For example, in testing new medical drugs or procedures, a 99% confidence interval may be required before the new drug or procedure is approved for general use. For example, a new drug for migraines might induce insomnia (difficulty of falling asleep) in some patients. If this side effect happens in too many patients, the

drug might not be approved. More precisely, if it could happen in more than 5% of all the patients, it won’t be approved. In a random sample of 632 migraine patients who took the new pill, 19 of them experienced insomnia. Based on this sample result, what would be your recommendation, should the new drug be approved or not?

We want to estimate the proportion of ALL migraine patients who would experience insomnia. The sample proportion, p $ , is 19/632 = 0.03 = 3%

We want to calculate the 99% confidence interval based on this sample result. Let’s check the conditions first: Random sample, np $^ = 19 > 10 and n ( 1 − p $)= 613 > 10

Conditions are satisfied. We use : $ *

p z p p n

Thus, using the formula above (with z* = 2.575), or using the A:1-PropZInt menu on the calculator, we get (0.0126, 0.0476).

Thus, based on this sample result, we are 99% confident that if we could test every migraine patients who would take this pill, the proportion of them who would experience insomnia would be between about 1.26% and 4.76%. Therefore, we can recommend the approval of the new drug.

  1. The Gallup Poll survey organization conducted telephone interviews with a randomly selected national sample of 1,003 adults, 18 years and older, on Mar. 3-5, 2003. In the survey they found that 281 adults said that the nation’s energy situation is “very serious”. Find a 95 and 99% confidence interval for the unknown proportion of Americans who felt that the nation’s energy situation is very serious.

This is a proportion problem. p $ x n

Conditions: random sample, checked,

np $^ = 1003 ⋅ = > , n ( − p $) = ( − )= >

95% confidence interval: $^ *

p z

p p n

(z* = 1.96)

Or using the calculator: STAT  TESTSA:1-PropZInt, x = 213, n = 1003, C-level: 0.

The 95% confidence interval is: (0.253, 0.308)

We are 95% confident that the proportion of ALL adult in the U.S. who feel that the nation’s energy situation is very serious is somewhere between 25.3% and 30.8%. That is, if we could ask EVERY adult in the U.S. and ask them what they think about the nation’s energy situation, we are 95% confident that 25.3%-30.8% of them would think that the energy situation is very serious.