Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of statistical inference using t-distributions, including the calculation of confidence intervals and hypothesis tests for population means. The use of the t-distribution when the population standard deviation is unknown and the concept of degrees of freedom. Examples are given for calculating confidence intervals and hypothesis tests for population mean yield of tomatoes and vitamin c content loss in wheat soy blend. The document also discusses the appropriateness of using t-procedures for different sample sizes and the presence of outliers or skewness.
Typology: Study notes
1 / 16
σ
x z n
x t s n
x t n s n
Using the t-distribution :
shaped, they are just a bit wider than the normal distribution.
o if you have a 2-sided test, multiply the P(t > |t*|) by 2 to get the area in both tails.
o The normal table showed lower tails only, so the t -table is backwards.
Finding t* on the table: Start at the bottom line to get the right column for your confidence level, and then work up to the correct row for your degrees of freedom.
What happens if your degrees of freedom isn’t on the table, for example df = 79? Always round DOWN to the next lowest degrees of freedom to be conservative.
Example of confidence interval for tomatoes: An agricultural expert performs a study to measure yield on a tomato field. Studying 10 plots of land, she finds the mean yield is 34 bushels with a sample standard deviation of 12.75. Find a 95% confidence interval for the unknown population mean yield of tomatoes.
mean yield of tomatoes is less than 42 bushels. State your conclusion in terms of the story. Also draw a picture of the t curve with the number and symbol for the mean you
s n
0 , and the test statistic ( t 0 ). Also shade the appropriate part of the curve which shows the P - value.
Example (Exercise 7.37): How accurate are radon detectors of a type sold to homeowners? To answer this question, university researchers placed 12 detectors in a chamber that exposed them to 105 picocuries per liter of radon. The detector readings were as follows:
a) Is there convincing evidence that the mean reading of all detectors of this type
in detail and write a brief conclusion. (SPSS tells us the mean and standard deviation of this data are 104.13 and 9.40, respectively.)
b) Find a 90% confidence interval for the population mean.
Now re-do the above example using SPSS completely.
Explore. Click on “Statistics” and change the CI to 90%. Then hit “OK.”
sample T test. Change the “test value” to 105 (since that is our H 0 ), change “options” to 90%, and hit “OK.” (This will give you the output below.)
One-Sample Test
Test Value = 105 90% Confidence Interval of the Difference t df Sig. (2-tailed)
Mean Difference (^) Lower Upper radon detector readings (^) -.319 11 .755 -.8667 -5.739 4.
Using this SPSS output,
hat would your t -curve with shaded P -value look like if you had hypotheses of:
One-Sample Test
Test Value = 105 90% Confidence Interval of the Difference t df Sig. (2-tailed)
Mean Difference (^) Lower Upper radon detector readings (^) -.319 11 .755 -.8667 -5.739 4.
w
a :^10
a :^10
a :^10
You must choose your hypothe ses BEFORE you examine the data. When in doubt, do a two-sided test.
How do you know when it isappropriate to use the t procedures?
ery important! Always look at your data first. Histograms and normal quantile plots
larger samples ( n ) improve the accuracy of the t distribution.
ome guidelines for inference on a single mean:
n ≥ 40 : Use t procedures even if data skewed.
ormal quantile plots:
SPSS, go to GraphsÆ Q-Q. Move your variable into “variable” column and hit “OK.”
(pgs. 80-83 in your book) will help you see the general shape of your data.
case of outliers or strong skewness.
are present, do not use t.
skewness.
N
In
90 100 110 120 Observed Value
90
100
110
120
Expected Normal Value
Normal Q-Q Plot of Radon Detector Reading
Look to see how closely the data points (dots) follow the diagonal line. The line will always be a 45-degree line. Only the data points will change. The closer they follow the line, the more normally distributed the data is.
What happens if the t procedure is not appropriate? What if you have outliers or skewness with a smaller sample size ( n < 40)?
o Was the data recorded correctly? Is there any reason why that data might be invalid (an equipment malfunction, a person lying in their response, etc.)? If there is a good reason why that point could be disregarded, try taking it out and compare the new confidence interval or hypothesis test results to the old ones.
o If you don’t have a valid reason for disregarding the outlier, you have to the outlier in and not use the t procedures.
o If the skewness is not too extreme, the t procedures are still appropriate if the sample size is bigger than 15.
o If the skewness is extreme or if the sample size is less than 15, you can use nonparametric procedures. One type of nonparametric test is similar to the t procedures except it uses the median instead of the mean. Another possibility would be to transform the data, possibly using logarithms. A statistician should be consulted if you have data which doesn’t fit the t procedures requirements. We won’t cover nonparametric procedures or transformations for non-normal data in this course, but your book has supplementary chapters (14 and 15) on these topics online if you need them later in your own research. They are also discussed on pages 465-470 of your book.
What do you do when you have 2 lists of data instead of 1?
First decide whether you have 1 sample with 2 measurements OR 2 independent samples with one measurement each.
s d t n
t test statistic: (^0)
d t s n
Example of Matched Pairs (Exercise 7.31): Researchers are interested in whether Vitamin C is lost when wheat soy blend (CSB) is cooked as gruel. Samples of gruel were collected, and the vitamin C content was measured (in mg per 100 grams of gruel) before and after cooking. Here are the results:
Sample 1 2 3 4 5 Mean St. Dev. Before 73 79 86 88 78 80.8 6. After 20 27 29 36 17 25.8 7. Before- After
a) Set up an appropriate hypothesis test for the population mean difference and carry it out for these data. State your conclusions in a sentence.
b) Find a 90% confidence interval for the population mean vitamin C content loss.
Example of Matched Pairs : In an effort to determine whether sensitivity training for nurses would improve the quality of nursing provided at an area hospital, the following study was conducted. Eight different nurses were selected and their nursing skills were given a score from 1-10. After this initial screening, a training program was administered, and then the same nurses were rated again. Below is a table of their pre- and post-training scores, along with the difference in the score. Conduct a test to determine whether the training could on average improve the quality of nursing provided in the population.
individuals Pre-training score
Post- training score 1 2.56 4. 2 3.22 5. 3 3.45 4. 4 5.55 7. 5 5.63 7. 6 7.89 9. 7 7.66 5. 8 6.20 6.
a. What are your hypotheses?
b. What is the test statistic?
c. What is the P -value?
d. What is your conclusion in terms of the story?
e. What is the 95% confidence interval of the population mean difference in nursing scores?
If you need to change the confidence interval, go to “Options.” SPSS will always do the left column of data – the right column of data for the order of the difference. If this bothers you, just be careful how you enter the data into the program.
Paired Samples Statistics
6.3212 8 1.82086. 5.2700 8 2.01808.
Post-training score Pre-training score
Pair 1
Mean N Std. Deviation
Std. Error Mean
Data entered as written above with pre-training in left column and post-training in right column:
Paired Samples Test
Paired Differences t df
Sig. (2- tailed)
Mean
Std. Deviation
Std. Error Mean
95% Confidence Interval of the Difference
Lower Upper Pair 1 pretraining - posttraining -1.05125^ 1.47417^ .52120^ -2.28369^ .18119^ -2.017^7.
Data entered backwards from how it is written above with post-training in left column and pre-training in right column:
Paired Samples Test
Pair 1 Post-training score- Pre-training score 1.05125 1.47417 .52120 -.18119 2.28369 2.017 7.
Mean Std. Deviation
Std. Error Mean Lower Upper
95% Confidence Interval of the Difference
Paired Differences
t df Sig. (2-tailed)
(^0 2 )
A B ~ distribution with df = min ( (^) A 1, (^) B 1) A B A B
x x t t n s s n n
n −
2 2 ( ) ^ A^ B where t ~t distribution with df = min 1, 1 A B A B A B
s s x x t n n n n
***Equal sample sizes are recommended, but not required.
Example of 2-Sample Comparison of Means : A group of 15 college seniors are selected to participate in a manual dexterity skill test against a group of 20 industrial workers. Skills are assessed by scores obtained on a test taken by both groups. Conduct a hypothesis test to determine whether the industrial workers had significantly better average manual dexterity skills than the students. Descriptive statistics are listed below. Also construct a 95% confidence interval for this problem.
group n x s df students 15 35.12 4. workers 20 37.32 3.
Example of 2-Sample Comparison of Means (Exercise 7.84) : The SSHA is a psychological test designed to measure the motivation, study habits, and attitudes towards learning of college students. These factors, along with ability, are important in explaining success in school. A selective private college gives the SSHA to an SRS of both male and female first-year students. The data for the women are as follows:
Here are the scores for the men:
108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104
a) Test whether the population mean SSHA score for men is different than the population mean score for women. State your hypotheses, carry out the test using SPSS, obtain a P -value, and give your conclusions.
When you enter your data into SPSS, have 2 variables: gender (type: string) and score (numeric). In the gender column, state whether a score is from a man or a woman, and
Samples T Test. Move score into “Test Variable(s)” box. Move gender into “Grouping
Variable” box, and then click “Define Groups” and state which “woman” and “man” as group 1 and group 2, hit “Continue”. We will need a 90% confidence interval in part c, so go to “Options” to change it.
Group Statistics
gender N Mean Std. Deviation
Std. Error Mean score woman (^18) 141.06 26.436 6. man (^20) 121.25 32.852 7.
hat do we do with this “Equal variances assumed” and “Equal variances not assumed”?
b) Most studies have found that the population mean SSHA score for men is
c) Give a 90% confidence interval for the difference in population means of
Independent Samples Test
.862 .359 2.032 36 .050 19.806 9.745 3.353 36. 2.056 35.587 .047 19.806 9.633 3.538 36.
Equal variances assumed Equal variances not assumed
score F Sig.
Levene's Test for Equality of Variances
t df Sig. (2-tailed)
Mean Difference
Std. Error Difference Lower Upper
90% Confidence Interval of the Difference
t-test for Equality of Means
Always go with the bottom row, “Equal variances not assumed.” This is the more conservative approach.
lower than the population mean score in a comparable group of women. Test this supposition here.
SSHA scores of male and female first-year students at this college.
To summarize Chapters 6 and 7:
Z vs t****?
Z if you know the population standard deviation. t if you only know the sample standard deviation.
Matched pairs vs. 2-sample comparison of means?
Matched pairs if all subjects are in one group and receive both treatments. Two-sample comparison of means if you have 2 distinct groups of subjects.
Other notes:
If you have a small sample, be careful! If you don’t have enough observations to do boxplots or normal quantile plots, you might have trouble looking for outliers, too. If the effect is large, you can still probably see it, but you might miss small effects.
For the 2-samples degrees of freedom in a t -test, we are taking df = min (nA-1, nB-1). There is also a software approximation for the degrees of freedom that does a better job than the “minimum” way.
The pooled 2-sample t procedures won’t be covered now, but if you need to compare more than 2 groups, you would need it (Chapter 12).