









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Zhao; Class: Elementary Statistical Methods; Subject: STAT-Statistics; University: Purdue University - Main Campus; Term: Spring 2007;
Typology: Study notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Big picture: w hat do we learn in this chapter?
Individuals vs. Variables
Categorical vs. Quantitative Variables
Graphs:
Describing distributions
2 components in describing data or information:
Distribution of a variable : describes what values a variables takes and how often it takes those values
If you have more than one variable in your problem, you should look at each variable by itself before you look at relationships between the variables.
Example: Identify whether the following questions would give you categorical or quantitative data. If it is categorical, state the possible answers.
a) What letter grade did you get in your Calculus class last semester?
b) What was your score on the last exam?
c) What is your GPA?
d) Did you vote for John Kerry?
e) Who did you vote for in the last election?
f) How many votes did George Bush get?
g) How many red M&Ms are in this bag?
h) Is this a red M&M?
i) What color is the M&M you just ate?
j) Which type of M&Ms has more red ones, peanut or plain?
Quantitative Variables:
To create a stem plot by hand,
0 | 7 1 | 2 | 0 4 5 5 8 8 3 | 0 2 5 4 | 2 3 4 5 6 7 8 8 5 | 0 1 6 | 7 | 2 5 7 8 9 8 | 3 7 8
A split stem plot just has more stems. There are several ways to split the stems. 0 | 7 1 | 1 | 2 | 0 4 2 | 5 5 8 8 3 | 0 2 3 | 5 4 | 2 3 4 4 | 5 6 7 8 8 5 | 0 1 5 | 6 | 6 | 7 | 2 7 | 5 7 8 9 8 | 3 8 | 7 8
Why do we need split stem plots? Sometimes it is easier to see the shape of the data with more stems. Sometimes a regular stem plot is better. If youāre not sure, try it both ways and see if a pattern appears.
To create a histogram by hand:
Interval range
that range 7 < x ⤠16 1 16 < x ⤠25 4 25 < x ⤠34 4 34 < x ⤠43 3 43 < x ⤠52 8 52 < x ⤠61 0 61 < x ⤠70 0 70 < x ⤠79 5 79 < x ⤠88 3 Total 28
How is a histogram different from a stem plot?
How is a histogram different from a bar graph?
Find the mean and median of the following 8 numbers in Dataset B:
1 2 4 6 8 9 12 13
b) Variance : s^2 , average of the square of deviations of observations from the mean 2 2 1
n i i
s x n (^) =
c) Standard Deviation : s , square root of the variance, common way for measuring how far observations are from the mean
Example of finding the standard deviation by hand:
0 2 4
d) P P^ th^ percentile : value such that p% of the observations fall at or below it
Median = M = 50th^ percentile First Quartile = Q 1 = 25th^ percentile Third Quartile = Q 3 = 75th^ percentile
How do you find quartiles? Think of them as āmini-medians.ā Leave the median out, and then find the median of what is left over on the left side (Q 1 ) and what is left over on the right side (Q 3 ).
Find the 1st^ and 3rd^ quartiles of the following 7 numbers in Dataset A:
-20 1 23 25 32.5 33 67
Find the 1st^ and 3rd^ quartiles of the following 8 numbers in Dataset B:
1 2 4 6 8 9 12 13
e) 5-Number Summary : Min Q 1 M Q 3 Max
f) Interquartile Range (IQR) = Q 3 ā Q 1
Call an observation a suspected outlier if it is: > Q 3 + 1.5 IQR OR < Q 1 ā 1.5 IQR
g) Boxplots : Use the 5-number summary
āThe Median vs. the Mean in the Age of Averageā by Mike Pesca on NPRās Day-to-Day 7/19/06: http://www.npr.org/templates/story/story.php?storyId=
Do you always have to do all of this by hand? NO! Statistical software packages like SPSS can make life much easier for you, but itās a good idea to know how to do these by hand so you can make sense of your output. Also, on the exam, you wonāt have access to a computer.
Read over your SPSS manual (part of the HW) and get comfortable with using it. You will have a chance to practice on the HW for this week, and you will work on it in lab on Friday.
Enter your data, then Analyze--> Descriptive Statistics--> Explore. Follow the instructions on p. 48 of the SPSS manual.
The output from SPSS for the internet time problem looks like:
Descriptives
54.77 5.
7 151 144 48 1.314. 1.977.
Mean Lower Bound Upper Bound
95% Confidence Interval for Mean
5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis
Time spent on the web
Statistic Std. Error
0 50 100 150 Time spent on the web
0
2
4
6
8
10
Frequency
Mean = 54.77Std. Dev. = 32. N = 30
Histogram
Time spent on the web Stem-and-Leaf Plot
222333
(s)
F requency Stem & Leaf
1.00 0. 0 9.00 0. 222 10.00 0. 4444444455 5.00 0. 77777 3.00 0. 888 .00 1. 1.00 1. 3 1.00 Extremes (>=
S tem width: 100 Each leaf: 1 case
otice on the boxplot, it is easy to identify the potential outlier. This would be your You
PSS can also give you the Quartiles (listed under āPercentilesā), but these are not eā
ask you to calculate the Quartiles, we want you to do them by hand.
indication that the 5-number summary would be the best way to describe your data. ( could also try calculating the mean and standard deviation without the outlier for comparison.)
S necessarily the same answers as what you would get by hand. The āweighted averag and āTukeyās Hingesā are not the same method we use. For this class, whenever we
Features of bell-shaped distributions (from Section 1.3)
A z -score tells us how many standard deviation away from the mean an observation is.
x z
This is also called getting a standardized value.
hy is standardization useful? For comparing apples to oranges.
xample: (p. 88, Problem 1.99) Jacob scores 16 on the ACT. Emily scores 670 on the AT. Assuming that both tests measure scholastic aptitude, who has the higher score? he SAT scores for 1.4 million students in a recent graduating class were roughly ormal with a mean of 1026 and standard deviation of 209. The ACT scores for more an 1 million students in the same class were roughly normal with mean of 20.8 and andard deviation of 4.8.
n th st
How else can we use standardization? If the distribution of observations has a bell-shape, en these standardized values have some special properties. One of these is the 68-95-
The most famous bell-shaped distribution is the Normal distribution. We will spend a whole week talking about it for Section 1.3, and it will be important to everything we do for the rest of the semester.
th 99.7 rule.
Standard deviations away from the mean ( z -score )
mean