


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to the concepts of average and standard deviation in statistics. It explains how these measures are used to summarize the properties of a distribution, with a focus on histograms. The document also discusses the differences between longitudinal and cross-sectional studies, and the relationship between average and median. Additionally, it covers the calculation of standard deviation and its significance.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Average and standard deviation
will also be discussed.The concepts of longitudinal studies versus cross-sectional studiesaverage and median determines the shape of the histogram.of average and median. We will see how the relationship betweenspread of the distribution of the data. We will compare the notiongive the definition of the standard deviation as a measure of therelationship between the average and the histogram. We will alsothe center of the distribution of the data. We will consider the In this class we will give the definition of average as a measure of
25
Average and spread in a histogram
distribution we can measure thea sample of data. If we want to summarize the properties of such a A histogram provides a graphical description of the distribution of
center
and the
spread
of the
the samerespond to samples withThese two histograms cor-histogram.
center
The
spread
of the sample
tomof the sample in the bot-on top is smaller than that
Histogram of n
n.
Density
−
−
−
0
2
4
6
0.00 0.10 0.20 0.
Histogram of n
n.
Density
−
−
−
0
2
4
6
0.00 0.10 0.20 0.
AMS-5: Statistics
calculate an To obtain an estimate of the center of the distribution we can
average
how many they are The average of a list of numbers equals their sum, divided by
students in this class, the average is given by Thus, if 18; 18; 21; 20; 19; 20; 20; 20; 19; 20 are the ages of 10
data corresponded to theIn the hospital data that we considered in the previous class the
average length of stay
of patients in each
number of patients in that hospital.patients in a given hospital were added and the sum divided by thehospital in the survey. This means that the length of stay of all
27
Longitudinal versus cross-sectional studies
and staff on campus and records the Suppose the university conducts a survey of all students, faculty
height, age
and
weight
of each
Such a study is aperson.
cross section
of the campus population. It gives a
averagebetween 30 and 50 and and above 50. Suppose you observe that theSuppose you group your sample in three age groups: below 30,picture of the characteristics of the population at a given time.
weight
in the first group is smaller than in the second and
this in turns is smaller than the average weight in the third group.
AMS-5: Statistics
You can not conclude that
age
is responsible for an increase in
weight
. This is because the effect may be confounded with the fact
Warning:this may have an effect in the average weight of the population.that eating habits may have changed during the last decades and
You can not draw conclusions about the effects of
age
To draw conclusions about the effect of(possibly) different ages. from such a study since you are comparing different people of
age
you need to conduct a
longitudinal study
. That is, you follow the evolution of a person’s
weight
in time, for each person in the sample.
29
Average and median
histogram of rainfall in Guarico, Venezuela
mm
Density
0
50
100
150
200
250
0.000 0.005 0.010 0.
mean
median
This
histogram
corresponds
to
the
rainfall
over
inperiods of 10 days
an
area
of
the
central
plains
of
TheVenezuela.
average
or
mean
rainfall
is
mm.
We
observe
that
only
about
of
the
observations
are
above the average.
AMS-5: Statistics
Notice that this histogram is not
symmetric
with respect to the
A average.
symmetric
his-
togram
will
look
the average.the data are aboveIn this case 50% oflike this.
Histogram of dat
dat
Density
0
2
4
6
8
10
0.00 0.05 0.10 0.
The
median
of a histogram is the value with half the area to the
and the average coincide.left and half to the right. In a symmetric histogram the median
31
the shape of the tails of a histogram. The relationship between the average and the median determines
Average bigger than
median: long right tail
Average about the same as median: symmetry
Average is smaller than median: long left tail
tails, it is preferable to use the median as a measure of centrality.dealing with variables like income or rainfall, that exhibit very long The average is very sensitive to extreme observations, so when
AMS-5: Statistics
Calculating the SD
To calculate the
standard deviation
of a sample follow the steps:
Calculate the average
difference between each datum and the average.Calculate the list of deviations from the average by taking the
Calculated the r.m.s. size of the resulting list.
SD = r.m.s. deviation from average.
Consider the list 20,10,15,15. Then
average =
The list of deviations is 5, -5, 0, 0. Then
2
2
2
2
37
Using a calculator
obtain the SD from your calculator (or spreadsheet), say SDthe deviations over the total number of data minus one. So, if youas we have defined it here. They consider the sum of the squares ofThe important fact is that most calculators do not produce the SDvalues vary from model to model.average and the SD of a sample. The steps needed to obtain those Most scientific calculators will have a function to calculate the
∗ , then
number of entries - one^
number of entries
∗
Some calculators have both, SD and SD
∗
. Please read the manual
the data were measured in years, SD is also in years.Notice that the units of SD are the same as the original data. So ifof your calculator regarding this fact.
AMS-5: Statistics
Problems
Problem 1:
Both the following lists have the same average of 50.
Which one has the smaller SD and why? (Do no computations)
Repeat for the following two listssmaller.The second list has more entries at the average, so the SD is
away from the average, so the SD is larger.The second list has two wild observations, 99 and 1, which are
39
Problem 2:
Consider the list of numbers
than 10, so the average is around 5.Only three of the numbers are smaller than 1, none are biggeraround 1, 5 or 10.
for the SD to be 6. So the 3 is the most likely value.from the average. The entries are too concentrated around 5If the SD is 1, then the entries 0.6 and 9.9 are too far away1,3 or 6.
AMS-5: Statistics