










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Introductory Applied Statistics for the Life Sciences; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Spring 2006;
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Cécile Ané
Stat 371
Spring 2006
(^1) Categorical data
(^2) Numerical data Displays Numerical summaries
A AB B O NA's
0
10
20
30
40
Blood type, 2005 survey
Milk data: milk yields (lbs/day) were collected from a particular herd on a given day. Data: 44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44.
stem(milk) The decimal point is 1 digit(s) to the right of the | 1 | 9 2 | 36 3 | 024779 4 | 1446 5 | 5
Look for the minimum and maximum, then decide on a precision and round off all data at the same precision. Last digit: leaf, any digit before: stem. 1 observation=1 leaf Leaves may then be ordered. Provides complete information. Pretty easy to do.
Rotate the histogram: like stem-leaf No single histogram. Lots of them! Rules are somewhat arbitrary. Histograms are useful with larger datasets, stem-leaf displays with smaller data sets.
Milk yield data: 44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44. y 1 = 44, y 2 = 55,... , y 14 = 44.
¯ y = ( 44 + 55 + · · · + 44 )/ 14 = ( y 1 + y 2 + · · · + y 14 )/ 14
=
n ( y 1 + y 2 + · · · + yn )
n
∑^ n
i = 1
yi
mean(milk) [1] 36.
Here y ¯ = 36 .2 lbs/day
Mode: most common value. More interesting for discrete data, with small # possible values and large # observations. Example: # of brothers. 0 | 0000000000000000000000000000000 1 | 00000000000000000000000000000000000000000000000000 2 | 000000000000000 3 | 00000 4 | 5 | 6 | 7 | 0
Mode = 1 (brother).
25 % percentile = 0.25 quantile = value such that 1/ observations are below and 3/4 are above. p quantile: value such that (about) a proportion p of observations are below and about 1 − p are above. Median is a special case (why?) Example: 6 9 11 17 19 23 26 26
First quartile Q 1 : median of those values below the median Third quartile Q 3 : median of those values above the median
Milk yield:
19 23 26 30 32 34 37
fivenum(milk) summary(milk) boxplot(milk)
20 25 30 35 40 45 50 55
Min Q1 Median Q3 Max
Milk yield data
Range: 36
IQR: 14
boxplot(Height ~ Sex)
No fence: whiskers extend to minimum and maximum With fences (Modified boxplot): Observations outside fences are drawn as points. Whiskers cannot go beyond fences. Fence = 1.5 IQR Milk example: IQR = 14. Fences are 1. 5 ∗ 14 = 21 below Q 1 and above Q 3 , i.e 30 − 21 = 9 and 44 + 21 = 65. Here smallest data point (min) = 19 > 9 and largest (max) = 55 < 65: no outlier.
Recall y 1 = first observation,... , yn = last observation. Deviation from the mean: yi − y ¯. Ex: first cow has deviation 44 − 36. 2 = + 7 .8, cow with data 19 has deviation 19 − 36. 2 = − 17 .2. Variance : s^2 ≥ 0 always!
s^2 =
n − 1
∑^ n
i = 1
( yi − y ¯)^2 =
n − 1
( y 1 − y ¯)^2 + · · · + ( yn − ¯ y )^2
Preferred formula for hand calculation:
s^2 =
n − 1
( (^) n ∑
i = 1
y i^2 − ny ¯^2
Here we get
s^2 =
= 95. 26 lb^2
Standard deviation: s =
variance =
s^2 is now in original units. s is the typical deviation. Here, s = 9 .8 lbs.
mean(milk) [1] 95. sd(milk) [1] 9.
20 30 40 50
l l l l l l l
l l l l
l l l
sd=9.
16.6 26.4 36.2 46.0 55.
sd=9.
The empirical rule: for most “mound-shaped” distributions, about 68% of observations lie within 1 standard deviation of the mean, (here: 9/14 = 64%) about 95% lie within 2 s.d. of the mean (here: 100%) about 99% lie within 3 s.d. of the mean (here: 100%)