Understanding Average and Standard Deviation: Measuring Center and Spread in Statistics, Study notes of Statistics

An introduction to the concepts of average and standard deviation in statistics. It explains how these measures are used to summarize the properties of a distribution, with a focus on histograms. The document also discusses the differences between longitudinal and cross-sectional studies, and the relationship between average and median. Additionally, it covers the calculation of standard deviation and its significance.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-tfn
koofers-user-tfn 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Average and standard deviation
In this class we will give the definition of average as a measure of
the center of the distribution of the data. We will consider the
relationship between the average and the histogram. We will also
give the definition of the standard deviation as a measure of the
spread of the distribution of the data. We will compare the notion
of average and median. We will see how the relationship between
average and median determines the shape of the histogram.
The concepts of longitudinal studies versus cross-sectional studies
will also be discussed.
AMS-5: Statistics
25
Average and spread in a histogram
A histogram provides a graphical description of the distribution of
a sample of data. If we want to summarize the properties of such a
distribution we can measure the center and the spread of the
histogram.
These two histograms cor-
respond to samples with
the same center.
The spread of the sample
on top is smaller than that
of the sample in the bot-
tom
Histogram of n1
n.1
Density
−6 −4 −2 0 2 4 6
0.00 0.10 0.20 0.30
Histogram of n2
n.2
Density
−6 −4 −2 0 2 4 6
0.00 0.10 0.20 0.30
AMS-5: Statistics
26
To obtain an estimate of the center of the distribution we can
calculate an average.
The average of a list of numbers equals their sum, divided by
how many they are
Thus, if 18; 18; 21; 20; 19; 20; 20; 20; 19; 20 are the ages of 10
students in this class, the average is given by
18 + 18 + 21 + 20 + 19 + 20 + 20 + 20 + 19 + 20
10 = 19.5
In the hospital data that we considered in the previous class the
data corresponded to the average length of stay of patients in each
hospital in the survey. This means that the length of stay of all
patients in a given hospital were added and the sum divided by the
number of patients in that hospital.
AMS-5: Statistics
27
Longitudinal versus cross-sectional studies
Suppose the university conducts a survey of all students, faculty
and staff on campus and records the height, age and weight of each
person.
Such a study is a cross section of the campus population. It gives a
picture of the characteristics of the population at a given time.
Suppose you group your sample in three age groups: below 30,
between 30 and 50 and and above 50. Suppose you observe that the
average weight in the first group is smaller than in the second and
this in turns is smaller than the average weight in the third group.
AMS-5: Statistics
28
pf3
pf4

Partial preview of the text

Download Understanding Average and Standard Deviation: Measuring Center and Spread in Statistics and more Study notes Statistics in PDF only on Docsity!

Average and standard deviation

will also be discussed.The concepts of longitudinal studies versus cross-sectional studiesaverage and median determines the shape of the histogram.of average and median. We will see how the relationship betweenspread of the distribution of the data. We will compare the notiongive the definition of the standard deviation as a measure of therelationship between the average and the histogram. We will alsothe center of the distribution of the data. We will consider the In this class we will give the definition of average as a measure of

25

Average and spread in a histogram

distribution we can measure thea sample of data. If we want to summarize the properties of such a A histogram provides a graphical description of the distribution of

center

and the

spread

of the

the samerespond to samples withThese two histograms cor-histogram.

center

The

spread

of the sample

tomof the sample in the bot-on top is smaller than that

Histogram of n

n.

Density

0

2

4

6

0.00 0.10 0.20 0.

Histogram of n

n.

Density

0

2

4

6

0.00 0.10 0.20 0.

AMS-5: Statistics

calculate an To obtain an estimate of the center of the distribution we can

average

how many they are The average of a list of numbers equals their sum, divided by

students in this class, the average is given by Thus, if 18; 18; 21; 20; 19; 20; 20; 20; 19; 20 are the ages of 10

data corresponded to theIn the hospital data that we considered in the previous class the

average length of stay

of patients in each

number of patients in that hospital.patients in a given hospital were added and the sum divided by thehospital in the survey. This means that the length of stay of all

27

Longitudinal versus cross-sectional studies

and staff on campus and records the Suppose the university conducts a survey of all students, faculty

height, age

and

weight

of each

Such a study is aperson.

cross section

of the campus population. It gives a

averagebetween 30 and 50 and and above 50. Suppose you observe that theSuppose you group your sample in three age groups: below 30,picture of the characteristics of the population at a given time.

weight

in the first group is smaller than in the second and

this in turns is smaller than the average weight in the third group.

AMS-5: Statistics

You can not conclude that

age

is responsible for an increase in

weight

. This is because the effect may be confounded with the fact

Warning:this may have an effect in the average weight of the population.that eating habits may have changed during the last decades and

You can not draw conclusions about the effects of

age

To draw conclusions about the effect of(possibly) different ages. from such a study since you are comparing different people of

age

you need to conduct a

longitudinal study

. That is, you follow the evolution of a person’s

weight

in time, for each person in the sample.

29

Average and median

histogram of rainfall in Guarico, Venezuela

mm

Density

0

50

100

150

200

250

0.000 0.005 0.010 0.

mean

median

This

histogram

corresponds

to

the

rainfall

over

inperiods of 10 days

an

area

of

the

central

plains

of

TheVenezuela.

average

or

mean

rainfall

is

mm.

We

observe

that

only

about

of

the

observations

are

above the average.

AMS-5: Statistics

Notice that this histogram is not

symmetric

with respect to the

A average.

symmetric

his-

togram

will

look

the average.the data are aboveIn this case 50% oflike this.

Histogram of dat

dat

Density

0

2

4

6

8

10

0.00 0.05 0.10 0.

The

median

of a histogram is the value with half the area to the

and the average coincide.left and half to the right. In a symmetric histogram the median

31

the shape of the tails of a histogram. The relationship between the average and the median determines

Average bigger than

median: long right tail

Average about the same as median: symmetry

Average is smaller than median: long left tail

tails, it is preferable to use the median as a measure of centrality.dealing with variables like income or rainfall, that exhibit very long The average is very sensitive to extreme observations, so when

AMS-5: Statistics

Calculating the SD

To calculate the

standard deviation

of a sample follow the steps:

Calculate the average

difference between each datum and the average.Calculate the list of deviations from the average by taking the

Calculated the r.m.s. size of the resulting list.

SD = r.m.s. deviation from average.

Consider the list 20,10,15,15. Then

average =

The list of deviations is 5, -5, 0, 0. Then

SD

2

2

2

2

37

Using a calculator

obtain the SD from your calculator (or spreadsheet), say SDthe deviations over the total number of data minus one. So, if youas we have defined it here. They consider the sum of the squares ofThe important fact is that most calculators do not produce the SDvalues vary from model to model.average and the SD of a sample. The steps needed to obtain those Most scientific calculators will have a function to calculate the

∗ , then

SD

number of entries - one^

number of entries

×

SD

Some calculators have both, SD and SD

. Please read the manual

the data were measured in years, SD is also in years.Notice that the units of SD are the same as the original data. So ifof your calculator regarding this fact.

AMS-5: Statistics

Problems

Problem 1:

Both the following lists have the same average of 50.

Which one has the smaller SD and why? (Do no computations)

  1. 50,40,60,30,70,25,75,50,50,501. 50,40,60,30,70,25,

Repeat for the following two listssmaller.The second list has more entries at the average, so the SD is

  1. 50,40,60,30,70,25,75,99,11. 50,40,60,30,70,25,

away from the average, so the SD is larger.The second list has two wild observations, 99 and 1, which are

39

Problem 2:

Consider the list of numbers

  1. Without doing any arithmetic, guess whether the average is

than 10, so the average is around 5.Only three of the numbers are smaller than 1, none are biggeraround 1, 5 or 10.

  1. Without doing any arithmetic, guess whether the SD is around

for the SD to be 6. So the 3 is the most likely value.from the average. The entries are too concentrated around 5If the SD is 1, then the entries 0.6 and 9.9 are too far away1,3 or 6.

AMS-5: Statistics