Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Dinov; Class: Introduction to Statistical Methods for Life and Health Sciences; Subject: Statistics; University: University of California - Los Angeles; Term: Fall 2004;
Typology: Study notes
1 / 8
STAT 13, UCLA, Ivo Dinov Slide 1
z Teaching Assistants: Chris Barr & Ming Zheng
STAT 13, UCLA, Ivo Dinov Slide 2
Univariate Data
zTypes of variables
zPresentation of data
zSimple plots
zNumerical summaries
zRepeated and grouped data
zQualitative variables
Slide 3 STAT 13, UCLA, Ivo Dinov
TABLE 2.1.1 Data on Male Heart Attack Patients
A subset of the data collected at a Hospital is summarized in this table. Each patient has measurements recorded for a number of variables – ID, Ejection factor (ventricular output), blood systolic/diastolic pressure, etc.
Slide 4 STAT 13, UCLA, Ivo Dinov
TABLE 2.1.1 Data on Male Heart Attack Patients S YS - DIA- OUT- ID EJEC VOL VOL OCCLU S TEN TIME COME AGE S MOKE BETA CHOLa^ S URG 390 72 36 131 0 0 143 0 49 2 2 59 0 279 52 74 155 37 63 143 0 54 2 2 68 1 391 62 52 137 33 47 16 2 56 2 2 52 0 201 50 165 329 33 30 143 0 42 2 2 39 0 202 50 47 95 0 100 143 0 46 2 2 74 1 69 27 124 170 77 23 143 0 57 2 2 NA 2 310 60 86 215 7 50 40 0 51 2 2 58 0 392 72 37 132 40 10 9 5 56 2 2 75 0 311 60 65 163 0 40 142 0 45 2 2 72 0 393 63 52 140 0 10 142 0 46 2 2 90 0 70 29 117 164 50 0 142 0 48 2 2 72 0 203 48 69 133 0 27 142 0 54 2 2 NA 0 394 59 54 133 30 13 142 0 39 2 1 NA 0 204 50 67 135 37 63 141 0 49 2 2 86 2 280 53 65 138 0 33 140 0 58 2 1 49 0 55 17 184 221 57 13 5 1 50 2 2 70 2 79 37 88 140 37 47 118 5 58 2 2 NA 0 205 45 106 193 33 43 140 0 47 1 1 38 1 206 43 85 150 0 50 23 5 51 2 2 61 0 312 60 59 149 7 37 139 0 43 2 1 56 0 80 38 103 168 47 43 100 1 55 2 2 62 1 281 57 53 124 0 57 140 0 58 2 1 93 0 207 44 68 121 27 60 139 0 55 2 2 63 1 282 51 53 109 0 77 139 0 41 2 2 45 4 396 63 58 157 0 73 139 0 51 2 2 60 0 208 49 81 157 13 13 139 0 49 2 2 60 0 209 48 58 112 0 0 72 1 56 2 2 57 0 283 58 71 167 27 0 138 0 45 2 1 46 0 210 42 92 159 0 0 139 0 57 2 2 58 0 397 68 50 156 0 100 138 0 51 2 1 NA 0 211 43 146 259 47 33 3 1 56 2 2 70 0 398 67 43 130 0 70 138 0 49 2 2 NA 3 284 52 70 146 0 23 137 0 47 1 2 NA 0 399 63 73 195 27 0 136 0 36 1 1 61 0 285 54 62 133 33 23 137 0 38 2 2 NA 0 71 37 93 148 47 0 137 0 59 2 2 NA 0 286 51 65 133 43 7 136 0 54 2 2 NA 0 212 42 95 163 40 10 109 3 57 2 2 NA 4 400 66 49 144 10 50 65 1 52 2 2 55 0 287 54 66 145 7 40 136 0 47 2 2 62 0 81 39 144 237 13 87 136 0 39 2 2 56 3 813 63 52 141 0 47 43 3 48 2 2 NA 0 68 30 219 314 33 45 76 1 53 1 2 NA 0 288 59 39 94 0 0 135 0 47 1 2 63 0 407 67 39 117 0 73 53 1 57 2 2 62 2 a NA = N o t A va ila ble (m is s in g d a ta c o de ).
a (^) N A = No t Ava ila ble (m is s ing da ta c o de ).
Slide 5 STAT 13, UCLA, Ivo Dinov
z Quantitative variables are measurements and
counts
Variables with few repeated values are treated as
continuous.
Variables with many repeated values are treated
as discrete
z Qualitative variables (a.k.a. factors or class-
variables) describe group membership
Types of variable
Slide 6 STAT 13, UCLA, Ivo Dinov
(few repeated values) (many repeated values) (no idea of order) (fall in natural order)
(measurements and counts) (define groups)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Distinguishing between types of variable
Slide 7 STAT 13, UCLA, Ivo Dinov
Questions …
z What is the difference between quantitative and
qualitative variables?
z What is the difference between a discrete variable
and a continuous variable?
z Name two ways in which observations on qualitative
variables can be stored on a computer. (strings/indexes)
z When would you treat a discrete random variable as
though it were a continuous random variable?
Slide 8 STAT 13, UCLA, Ivo Dinov
Storing and Reporting Numbers
z Round numbers for presentation
z Maintain complete accuracy in numbers to be used
in calculations. If you need to round-off, this should
be the very last operation …
Slide 9 STAT 13, UCLA, Ivo Dinov
Table before simplification
Country 1970 1975 1980 1985 1990 Belgium 42.01 42.17 34.18 34.18 30. Canada 22.59 21.95 20.98 20.11 14. France 100.91 100.93 81.85 81.85 81. Italy 82.48 82.48 66.67 66.67 66. Japan 15.22 21.11 24.23 24.33 24. Netherlands 51.06 54.33 43.94 43.94 43. Switzerland 78.03 83.2 83.28 83.28 83. U.K. 38.52 21.03 18.84 19.03 18. U.S.A. 316.34 274.71 264.32 262.65 261. Units: millions of troy ounces. Source: The World Almanac and Book of Facts.
Slide 10 STAT 13, UCLA, Ivo Dinov
Country 1970 1975 1980 1985 1990 Average US 320 270 260 260 260 280 Switzerland 78 83 83 83 83 82 France 100 100 82 82 82 89 Italy 82 82 67 67 67 73 Netherlands 51 54 44 44 44 47 Belgium 42 42 34 34 30 37 Japan 15 21 24 24 24 22 UK 39 21 19 19 19 23 Canada 23 22 21 20 15 20
Average 83 78 71 71 70 Units: millions of troy ounces.
Table after simplification
Slide 11 STAT 13, UCLA, Ivo Dinov
21% S. Africa
Austr.
Can.
Chin.
Rest
S. Africa
U.S.
USSR
Austr.
Can.
China
Rest
om Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Different graphs of the same set of numbers
Slide 12 STAT 13, UCLA, Ivo Dinov
Questions …
z For what two purposes are tables of numbers
presented? ( convey information about trends in the data, detailed
analysis)
z When should you round numbers, and when should you
preserve full accuracy?
z How should you arrange the numbers you are most
interested in comparing? ( Arrange numbers you want to compare in
row/column averages.)
z Should a table be left to tell its own story?
Slide 13 STAT 13, UCLA, Ivo Dinov
Figure 2.3.1 Dot plot.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
cluster gap outlier
Figure 2.3.2 Dot plot showing special features.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
The dot plot
Atypical obs.
Slide 14 STAT 13, UCLA, Ivo Dinov
B
B A A
D C C A
Figure 2.3.3 Grading of a university course.
Example of exploiting gaps and clusters
F D C- C C+ B- B B+ A- A A+
Slide 15 STAT 13, UCLA, Ivo Dinov
(a) Unbroken scale
(b) Broken scale
Figure 2.3.4 Dot plot with and without a scale break.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Scale breaks
Slide 16 STAT 13, UCLA, Ivo Dinov
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
A labeled dot plot
Slide 17 STAT 13, UCLA, Ivo Dinov
Units: 7 | 2 =
1 7
2 7 9
3 0 7 7 8 9
4 2 2 3 3 4 5 8 8 9
5 0 0 0 1 1 2 2 3 4 4 7 8 9 9
6 0 0 0 2 3 3 3 3 6 7 7 8
7 2 2
Example of a stem-and-leaf plot
Stem-plot of the 45 obs’s of the Ejection variable in the
Heart Attack data table.
Values 52, 54 and
their frequencies
Stem Leafs
Slide 18 STAT 13, UCLA, Ivo Dinov
Traffic death-rates data
TABLE 2.3.1 Traffic Death-Rates (per 100,000 Population) for 30 Countries
17.4 Australia 20.1 Austria 19.9 Belgium 12.5 Bulgaria 15.8 Canada 10.1 Czechoslovakia 13.0 Denmark 11.6 Finland 20.0 France 12.0 E. Germany 13.1 W. Germany 21.1 Greece 5.4 Hong Kong 17.1 Hungary 15.3 Ireland 10.3 Israel 10.4 Japan 26.8 Kuwait 11.3 Netherlands 20.1 New Zealand 10.5 Norway 14.6 Poland 25.6 Portugal 12.6 Singapore 9.8 Sweden 15.7 Switzerland 18.6 United States 12.1 N. Ireland 12.0 Scotland 10.1England & Wales Data for 1983, 1984 or 1985 depending on the country (prior to reunification of Germany) Source: Hutchinson [1987, page 3].
Slide 19 STAT 13, UCLA, Ivo Dinov
Units: 17 | 4 = 17.4 deaths per 100, 5 4 6 7 8 9 8 Units: 1 | 7 = 17 deaths per 100, 10 1 1 3 4 5 0 5 11 3 6 0 12 0 0 1 5 6 0 13 0 1 1 0 0 0 0 0 1 1 14 6 1 2 2 2 2 3 3 3 15 3 7 8 1 5 5 16 1 6 6 7 7 17 1 4 1 9 18 6 2 0 0 0 0 1 19 9 2 20 0 1 1 2 21 1 2 6 7 22 23 24 25 6 26 8
FIGURE 2.3.7 Two stem-and-leaf plots for the traffic deaths data
Collapse to
12 stems
(a)
(b)
Round-off
Slide 20 STAT 13, UCLA, Ivo Dinov
TABLE 2.3.2 Coyote Lengths Data (cm) Females 93.0 97.0 92.0 101.6 93.0 84.5 102.5 97.8 91.0 98.0 93.5 91. 90.2 91.5 80.0 86.4 91.4 83.5 88.0 71.0 81.3 88.5 86.5 90. 84.0 89.5 84.0 85.0 87.0 88.0 86.5 96.0 87.0 93.5 93.5 90. 85.0 97.0 86.0 73. Males 97.0 95.0 96.0 91.0 95.0 84.5 88.0 96.0 96.0 87.0 95.0 100. 101.0 96.0 93.0 92.5 95.0 98.5 88.0 81.3 91.4 88.9 86.4 101. 83.8 104.1 88.9 92.0 91.0 90.0 85.0 93.5 78.0 100.5 103.0 91. 105.0 86.0 95.5 86.5 90.5 80.0 80. Coyotes captured in Nova Scotia, Canada. Data courtesy of Dr Vera Eastwood. TABLE 2.3.3 Frequency Table for Female Coyote Lengths
Class Interval Tally Frequency Stem-and-leaf plot 70-75 - || 2 7 1 4 75-
Body
length
Slide 21 STAT 13, UCLA, Ivo Dinov
length (cm) (a) Histogram (b) Stem-and-leaf plot rotated
Figure 2.3.8 Histogram of the female coyote-lengths data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
TABLE 2.3.3 Frequency Table for Female Coyote Lengths
Class Interval Tally Frequency Stem-and-leaf p lot 70-75 -^ || 2 7 1 4 75-80 -^0 80-85 -^ |||| | 6 8 0 1 4 4 4 85-90 -^ |||| |||| || 12 8 5 5 5 6 6 7 7 7 7 8 8 9 90-95 -^ |||| |||| ||| 13 9 0 0 0 0 1 1 2 2 2 3 3 4 4 4 95-100 -^ |||| 5 9 6 7 7 8 8 100-105 -^ || 2 10 2 3 Total 40
compare
Slide 22 STAT 13, UCLA, Ivo Dinov
(a) Original histogram (interval width = 5)
(c) Same widths, different boundaries (interval width = 5)
(b) Change class-interval width (interval width = 3)
(d) Density trace (window width = 5)
70 80 90 100 Length (cm)
70 80 90 100 Length (cm)
70 80 90 100 Length (cm)
110 70 80 90 100 Length (cm)
0
4
8
12
0
4
8
12
0
4
8
12
0
4
8
12
Figure 2.3.9 Histograms and density trace of female coyote-lengths data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Histogram bin-size change
Histogram bin-boundary change
Slide 23 STAT 13, UCLA, Ivo Dinov
Questions …
z What advantages does a stem-and-leaf plot have over a histogram? (S&L Plots return info on individual values, quick to
attractive and more understandable ).
z The shape of a histogram can be quite drastically altered by choosing different class-interval boundaries. What type of plot does not have this problem? (density trace ) What other factor affects the shape of a histogram? ( bin-size)
z What was another reason given for plotting data on a variable, apart from interest in how the data on that variable behaves? ( shows features, cluster/gaps, outliers; as well as trends)
Slide 24 STAT 13, UCLA, Ivo Dinov
(e) Positively skewed
(a) Unimodal (b) Bimodal (c) Trimodal
(d) Symmetric (long upper tail)
(f) Negatively skewed (long lower tail)
(g) Symmetric (h) Bimodal with gap (i) Exponential shape
Interpreting Stem-plots and Histograms
e
e
x
1
Slide 25 STAT 13, UCLA, Ivo Dinov
(j) Spike in pattern
(k) Outliers (l) Truncation plus outlier
outlier outlier
spike
Figure 2.3.10 Features to look for in histograms and stem-and-leaf plots.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Interpreting Stem-plots and Histograms
Slide 26 STAT 13, UCLA, Ivo Dinov
Fascinations with histograms –
Histogram of heights using the actual people
Subjects are university genetics students, females in white
and males in dark tops.
?
Slide 27 STAT 13, UCLA, Ivo Dinov
Questions …
z What does it mean for a histogram or stem-and-leaf plot to be bimodal? What do we suspect when we see a bimodal plot?
z What are outliers, and how do they show up in these plots? What should we try to do when we see them?
z What do we mean by symmetry and positive and negative skewness?
z What shape do we call exponential?
z Should we be suspicious of abrupt changes? Why?
Slide 28 STAT 13, UCLA, Ivo Dinov
Variable N Mean Median TrMean StDev SE Mean age 45 50.133 51.000 50.366 6.092 0. Variable Minimum Maximum Q1 Q age 36.000 59.000 46.500 56.
Standard deviation
Lower quartile Upper quartile
Descriptive statistics from computer
programs like STATA
STATA Output
Slide 29 STAT 13, UCLA, Ivo Dinov
z The sample mean is denoted by (^) x.
Descriptive statistics …
Sum of the observations
Number of observations
The sample mean =
Slide 30 STAT 13, UCLA, Ivo Dinov
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
The sample mean is where the dot plot balances
Slide 31 STAT 13, UCLA, Ivo Dinov
If is not a whole number, the median is the
average of the two observations on either side.
n + 1
2
For n observations, {x 1 , x 2 , x 3 , …, xn }. Suppose we order
the observations min-to-max to get
{x(1) , x (^) (2) , x (^) (3) , …, x (^) (n) }.
Then the sample median is the [(n+1)/2]-st largest
Observation
Slide 32 STAT 13, UCLA, Ivo Dinov
[Grey disks in (b) are the ``ghosts'' of the points that were moved.] From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Slide 33 STAT 13, UCLA, Ivo Dinov
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 1999. Slide 34 STAT 13, UCLA, Ivo Dinov
z How is the sample mean related to the dot plot?
z If the index ( n +1)/2 is not a whole number (e.g.,
23.5), how do we obtain the sample median?
z Why is the sample median usually preferred to the sample mean for skewed data? Why is it preferred for
“dirty” data?
z Under what circumstances may quoting a single center
(be it mean or median) not make sense?( multi-modal )
z What can we say about the sample mean of a
qualitative variable? ( meaningless)
0 100 mean
Slide 35 STAT 13, UCLA, Ivo Dinov
The first quartile ( Q 1) is the median of all the observations
whose position is strictly below the position of the median,
and the third quartile ( Q 3) is the median of those above.
median
Slide 36 STAT 13, UCLA, Ivo Dinov
The five-number summery = (Min, Q 1 , Med, Q 3 , Max)
Slide 37 STAT 13, UCLA, Ivo Dinov
IQR = Q
Inter-quartile Range
Slide 38 STAT 13, UCLA, Ivo Dinov
Q 1 Median Q (^3)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Box plot compared to dot plot
Slide 39 STAT 13, UCLA, Ivo Dinov
(pull back until hit observation) (pull back until hit observation)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Construction of a box plot
Slide 40 STAT 13, UCLA, Ivo Dinov
Comparing 3 plots of the same data
Slide 41 STAT 13, UCLA, Ivo Dinov
Words on a Randomly Chosen Page 3 2 2 4 4 4 3 9 9 3 6 2 3 2 3 4 6 5 3 4 2 3 4 5 2 9 5 8 3 2 4 5 2 4 1 4 2 5 2 5 3 6 9 6 3 2 3 4 4 4 2 2 4 2 3 7 4 2 6 4 2 5 9 2 3 7 11 2 3 6 4 4 7 6 6 10 4 3 5 7 7 7 5 10 3 2 3 9 4 5 5 4 4 3 5 2 5 2 4 2
j j
Frequency Table
Frequency Table
Slide 42 STAT 13, UCLA, Ivo Dinov
(Sumofallobservatio ns )
1
Sumof (value frequency ofoccurrence )
1
n
n
x = × =
Mean from a frequency table
Value Frequency Value x Frequency
2 3 6
4 2 8
5 14
Example: {2, 4, 2, 4, 2 }
Mean = 14/
Slide 43 STAT 13, UCLA, Ivo Dinov
Frequency Table for the Occurrence of Fish Species in Ocean Strata
No. of strata Frequency Percentage in which species occur (No. of species) of species Cumulative Percentage
n = 330 100 Source: Haedrich and Merrett [1988]
fj n
Slide 44 STAT 13, UCLA, Ivo Dinov
Figure 2.5.1 Bar graph for species data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Slide 45 STAT 13, UCLA, Ivo Dinov
Labeled bar graphs to convey size
Gross Rents ($ per ft ) 2
City
Figure 2.6.2 Cost of commercial rents around the world. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.