




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Dinov; Class: Introduction to Statistical Methods for Life and Health Sciences; Subject: Statistics; University: University of California - Los Angeles; Term: Fall 2004;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





STAT 13, UCLA, Ivo Dinov Slide 1
z Teaching Assistants: Chris Barr & Ming Zheng
STAT 13, UCLA, Ivo Dinov Slide 2
Univariate Data
zTypes of variables
zPresentation of data
zSimple plots
zNumerical summaries
zRepeated and grouped data
zQualitative variables
Slide 3 STAT 13, UCLA, Ivo Dinov
TABLE 2.1.1 Data on Male Heart Attack Patients
A subset of the data collected at a Hospital is summarized in this table. Each patient has measurements recorded for a number of variables – ID, Ejection factor (ventricular output), blood systolic/diastolic pressure, etc.
Slide 4 STAT 13, UCLA, Ivo Dinov
TABLE 2.1.1 Data on Male Heart Attack Patients S YS - DIA- OUT- ID EJEC VOL VOL OCCLU S TEN TIME COME AGE S MOKE BETA CHOLa^ S URG 390 72 36 131 0 0 143 0 49 2 2 59 0 279 52 74 155 37 63 143 0 54 2 2 68 1 391 62 52 137 33 47 16 2 56 2 2 52 0 201 50 165 329 33 30 143 0 42 2 2 39 0 202 50 47 95 0 100 143 0 46 2 2 74 1 69 27 124 170 77 23 143 0 57 2 2 NA 2 310 60 86 215 7 50 40 0 51 2 2 58 0 392 72 37 132 40 10 9 5 56 2 2 75 0 311 60 65 163 0 40 142 0 45 2 2 72 0 393 63 52 140 0 10 142 0 46 2 2 90 0 70 29 117 164 50 0 142 0 48 2 2 72 0 203 48 69 133 0 27 142 0 54 2 2 NA 0 394 59 54 133 30 13 142 0 39 2 1 NA 0 204 50 67 135 37 63 141 0 49 2 2 86 2 280 53 65 138 0 33 140 0 58 2 1 49 0 55 17 184 221 57 13 5 1 50 2 2 70 2 79 37 88 140 37 47 118 5 58 2 2 NA 0 205 45 106 193 33 43 140 0 47 1 1 38 1 206 43 85 150 0 50 23 5 51 2 2 61 0 312 60 59 149 7 37 139 0 43 2 1 56 0 80 38 103 168 47 43 100 1 55 2 2 62 1 281 57 53 124 0 57 140 0 58 2 1 93 0 207 44 68 121 27 60 139 0 55 2 2 63 1 282 51 53 109 0 77 139 0 41 2 2 45 4 396 63 58 157 0 73 139 0 51 2 2 60 0 208 49 81 157 13 13 139 0 49 2 2 60 0 209 48 58 112 0 0 72 1 56 2 2 57 0 283 58 71 167 27 0 138 0 45 2 1 46 0 210 42 92 159 0 0 139 0 57 2 2 58 0 397 68 50 156 0 100 138 0 51 2 1 NA 0 211 43 146 259 47 33 3 1 56 2 2 70 0 398 67 43 130 0 70 138 0 49 2 2 NA 3 284 52 70 146 0 23 137 0 47 1 2 NA 0 399 63 73 195 27 0 136 0 36 1 1 61 0 285 54 62 133 33 23 137 0 38 2 2 NA 0 71 37 93 148 47 0 137 0 59 2 2 NA 0 286 51 65 133 43 7 136 0 54 2 2 NA 0 212 42 95 163 40 10 109 3 57 2 2 NA 4 400 66 49 144 10 50 65 1 52 2 2 55 0 287 54 66 145 7 40 136 0 47 2 2 62 0 81 39 144 237 13 87 136 0 39 2 2 56 3 813 63 52 141 0 47 43 3 48 2 2 NA 0 68 30 219 314 33 45 76 1 53 1 2 NA 0 288 59 39 94 0 0 135 0 47 1 2 63 0 407 67 39 117 0 73 53 1 57 2 2 62 2 a NA = N o t A va ila ble (m is s in g d a ta c o de ).
a (^) N A = No t Ava ila ble (m is s ing da ta c o de ).
Slide 5 STAT 13, UCLA, Ivo Dinov
z Quantitative variables are measurements and
counts
Variables with few repeated values are treated as
continuous.
Variables with many repeated values are treated
as discrete
z Qualitative variables (a.k.a. factors or class-
variables) describe group membership
Types of variable
Slide 6 STAT 13, UCLA, Ivo Dinov
(few repeated values) (many repeated values) (no idea of order) (fall in natural order)
(measurements and counts) (define groups)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Distinguishing between types of variable
Slide 7 STAT 13, UCLA, Ivo Dinov
Questions …
z What is the difference between quantitative and
qualitative variables?
z What is the difference between a discrete variable
and a continuous variable?
z Name two ways in which observations on qualitative
variables can be stored on a computer. (strings/indexes)
z When would you treat a discrete random variable as
though it were a continuous random variable?
Slide 8 STAT 13, UCLA, Ivo Dinov
Storing and Reporting Numbers
z Round numbers for presentation
z Maintain complete accuracy in numbers to be used
in calculations. If you need to round-off, this should
be the very last operation …
Slide 9 STAT 13, UCLA, Ivo Dinov
Table before simplification
Country 1970 1975 1980 1985 1990 Belgium 42.01 42.17 34.18 34.18 30. Canada 22.59 21.95 20.98 20.11 14. France 100.91 100.93 81.85 81.85 81. Italy 82.48 82.48 66.67 66.67 66. Japan 15.22 21.11 24.23 24.33 24. Netherlands 51.06 54.33 43.94 43.94 43. Switzerland 78.03 83.2 83.28 83.28 83. U.K. 38.52 21.03 18.84 19.03 18. U.S.A. 316.34 274.71 264.32 262.65 261. Units: millions of troy ounces. Source: The World Almanac and Book of Facts.
Slide 10 STAT 13, UCLA, Ivo Dinov
Country 1970 1975 1980 1985 1990 Average US 320 270 260 260 260 280 Switzerland 78 83 83 83 83 82 France 100 100 82 82 82 89 Italy 82 82 67 67 67 73 Netherlands 51 54 44 44 44 47 Belgium 42 42 34 34 30 37 Japan 15 21 24 24 24 22 UK 39 21 19 19 19 23 Canada 23 22 21 20 15 20
Average 83 78 71 71 70 Units: millions of troy ounces.
Table after simplification
Slide 11 STAT 13, UCLA, Ivo Dinov
21% S. Africa
Austr.
Can.
Chin.
Rest
S. Africa
U.S.
USSR
Austr.
Can.
China
Rest
om Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Different graphs of the same set of numbers
Slide 12 STAT 13, UCLA, Ivo Dinov
Questions …
z For what two purposes are tables of numbers
presented? ( convey information about trends in the data, detailed
analysis)
z When should you round numbers, and when should you
preserve full accuracy?
z How should you arrange the numbers you are most
interested in comparing? ( Arrange numbers you want to compare in
row/column averages.)
z Should a table be left to tell its own story?
Slide 19 STAT 13, UCLA, Ivo Dinov
Units: 17 | 4 = 17.4 deaths per 100, 5 4 6 7 8 9 8 Units: 1 | 7 = 17 deaths per 100, 10 1 1 3 4 5 0 5 11 3 6 0 12 0 0 1 5 6 0 13 0 1 1 0 0 0 0 0 1 1 14 6 1 2 2 2 2 3 3 3 15 3 7 8 1 5 5 16 1 6 6 7 7 17 1 4 1 9 18 6 2 0 0 0 0 1 19 9 2 20 0 1 1 2 21 1 2 6 7 22 23 24 25 6 26 8
FIGURE 2.3.7 Two stem-and-leaf plots for the traffic deaths data
Collapse to
12 stems
(a)
(b)
Round-off
Slide 20 STAT 13, UCLA, Ivo Dinov
TABLE 2.3.2 Coyote Lengths Data (cm) Females 93.0 97.0 92.0 101.6 93.0 84.5 102.5 97.8 91.0 98.0 93.5 91. 90.2 91.5 80.0 86.4 91.4 83.5 88.0 71.0 81.3 88.5 86.5 90. 84.0 89.5 84.0 85.0 87.0 88.0 86.5 96.0 87.0 93.5 93.5 90. 85.0 97.0 86.0 73. Males 97.0 95.0 96.0 91.0 95.0 84.5 88.0 96.0 96.0 87.0 95.0 100. 101.0 96.0 93.0 92.5 95.0 98.5 88.0 81.3 91.4 88.9 86.4 101. 83.8 104.1 88.9 92.0 91.0 90.0 85.0 93.5 78.0 100.5 103.0 91. 105.0 86.0 95.5 86.5 90.5 80.0 80. Coyotes captured in Nova Scotia, Canada. Data courtesy of Dr Vera Eastwood. TABLE 2.3.3 Frequency Table for Female Coyote Lengths
Class Interval Tally Frequency Stem-and-leaf plot 70-75 - || 2 7 1 4 75-
Body
length
Slide 21 STAT 13, UCLA, Ivo Dinov
length (cm) (a) Histogram (b) Stem-and-leaf plot rotated
Figure 2.3.8 Histogram of the female coyote-lengths data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
TABLE 2.3.3 Frequency Table for Female Coyote Lengths
Class Interval Tally Frequency Stem-and-leaf p lot 70-75 -^ || 2 7 1 4 75-80 -^0 80-85 -^ |||| | 6 8 0 1 4 4 4 85-90 -^ |||| |||| || 12 8 5 5 5 6 6 7 7 7 7 8 8 9 90-95 -^ |||| |||| ||| 13 9 0 0 0 0 1 1 2 2 2 3 3 4 4 4 95-100 -^ |||| 5 9 6 7 7 8 8 100-105 -^ || 2 10 2 3 Total 40
compare
Slide 22 STAT 13, UCLA, Ivo Dinov
(a) Original histogram (interval width = 5)
(c) Same widths, different boundaries (interval width = 5)
(b) Change class-interval width (interval width = 3)
(d) Density trace (window width = 5)
70 80 90 100 Length (cm)
70 80 90 100 Length (cm)
70 80 90 100 Length (cm)
110 70 80 90 100 Length (cm)
0
4
8
12
0
4
8
12
0
4
8
12
0
4
8
12
Figure 2.3.9 Histograms and density trace of female coyote-lengths data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Histogram bin-size change
Histogram bin-boundary change
Slide 23 STAT 13, UCLA, Ivo Dinov
Questions …
z What advantages does a stem-and-leaf plot have over a histogram? (S&L Plots return info on individual values, quick to
attractive and more understandable ).
z The shape of a histogram can be quite drastically altered by choosing different class-interval boundaries. What type of plot does not have this problem? (density trace ) What other factor affects the shape of a histogram? ( bin-size)
z What was another reason given for plotting data on a variable, apart from interest in how the data on that variable behaves? ( shows features, cluster/gaps, outliers; as well as trends)
Slide 24 STAT 13, UCLA, Ivo Dinov
(e) Positively skewed
(a) Unimodal (b) Bimodal (c) Trimodal
(d) Symmetric (long upper tail)
(f) Negatively skewed (long lower tail)
(g) Symmetric (h) Bimodal with gap (i) Exponential shape
Interpreting Stem-plots and Histograms
e
e
x
1
Slide 25 STAT 13, UCLA, Ivo Dinov
(j) Spike in pattern
(k) Outliers (l) Truncation plus outlier
outlier outlier
spike
Figure 2.3.10 Features to look for in histograms and stem-and-leaf plots.
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Interpreting Stem-plots and Histograms
Slide 26 STAT 13, UCLA, Ivo Dinov
Fascinations with histograms –
Histogram of heights using the actual people
Subjects are university genetics students, females in white
and males in dark tops.
?
Slide 27 STAT 13, UCLA, Ivo Dinov
Questions …
z What does it mean for a histogram or stem-and-leaf plot to be bimodal? What do we suspect when we see a bimodal plot?
z What are outliers, and how do they show up in these plots? What should we try to do when we see them?
z What do we mean by symmetry and positive and negative skewness?
z What shape do we call exponential?
z Should we be suspicious of abrupt changes? Why?
Slide 28 STAT 13, UCLA, Ivo Dinov
Variable N Mean Median TrMean StDev SE Mean age 45 50.133 51.000 50.366 6.092 0. Variable Minimum Maximum Q1 Q age 36.000 59.000 46.500 56.
Standard deviation
Lower quartile Upper quartile
Descriptive statistics from computer
programs like STATA
STATA Output
Slide 29 STAT 13, UCLA, Ivo Dinov
z The sample mean is denoted by (^) x.
Descriptive statistics …
Sum of the observations
Number of observations
The sample mean =
Slide 30 STAT 13, UCLA, Ivo Dinov
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
The sample mean is where the dot plot balances
Slide 37 STAT 13, UCLA, Ivo Dinov
IQR = Q
Inter-quartile Range
Slide 38 STAT 13, UCLA, Ivo Dinov
Q 1 Median Q (^3)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Box plot compared to dot plot
Slide 39 STAT 13, UCLA, Ivo Dinov
(pull back until hit observation) (pull back until hit observation)
From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Construction of a box plot
Slide 40 STAT 13, UCLA, Ivo Dinov
Comparing 3 plots of the same data
Slide 41 STAT 13, UCLA, Ivo Dinov
Words on a Randomly Chosen Page 3 2 2 4 4 4 3 9 9 3 6 2 3 2 3 4 6 5 3 4 2 3 4 5 2 9 5 8 3 2 4 5 2 4 1 4 2 5 2 5 3 6 9 6 3 2 3 4 4 4 2 2 4 2 3 7 4 2 6 4 2 5 9 2 3 7 11 2 3 6 4 4 7 6 6 10 4 3 5 7 7 7 5 10 3 2 3 9 4 5 5 4 4 3 5 2 5 2 4 2
j j
Frequency Table
Frequency Table
Slide 42 STAT 13, UCLA, Ivo Dinov
(Sumofallobservatio ns )
1
Sumof (value frequency ofoccurrence )
1
n
n
x = × =
Mean from a frequency table
Value Frequency Value x Frequency
2 3 6
4 2 8
5 14
Example: {2, 4, 2, 4, 2 }
Mean = 14/
Slide 43 STAT 13, UCLA, Ivo Dinov
Frequency Table for the Occurrence of Fish Species in Ocean Strata
No. of strata Frequency Percentage in which species occur (No. of species) of species Cumulative Percentage
n = 330 100 Source: Haedrich and Merrett [1988]
fj n
Slide 44 STAT 13, UCLA, Ivo Dinov
Figure 2.5.1 Bar graph for species data. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.
Slide 45 STAT 13, UCLA, Ivo Dinov
Labeled bar graphs to convey size
Gross Rents ($ per ft ) 2
City
Figure 2.6.2 Cost of commercial rents around the world. From Chance Encounters by C.J. Wild and G.A.F. Seber, © John Wiley & Sons, 2000.