Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Means, Medians, Variance, and Standard Deviation in Applied Biostatistics, Study notes of Mathematical Methods

University of York Mathematical Methods

This document, authored by professor martin bland of the university of york, provides an introduction to the concepts of mean, median, variance, and standard deviation in the context of applied biostatistics. The calculation and interpretation of these measures of central tendency and variability, as well as their relationship to skewness and the normal distribution.

Typology: Study notes

2010/2011

Uploaded on 09/10/2011

myohmy 🇬🇧

4.8

(10)

297 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

1

Applied Biostatistics

Mean and Standard Deviation

Martin Bland

Professor of Health Statistics

University of York

http://www-users.york.ac.uk/~mb55/

The mean

The arithmetic mean or average, usually referred to

simply as the mean is found by taking the sum of the

observations and dividing by their number.

The mean is often denoted by a little bar over the

symbol for the variable, e.g. .

The sample mean has much nicer mathematical

properties than the median and is thus more useful for

the comparison methods described later.

The median is a very useful descriptive statistic, but not

much used for other purposes.

x

Median, mean and skewness:

Mean FEV1 = 4.06. Median FEV1 = 4.1, so the median

is within 1% of the mean.

Mean triglyceride = 0.51. Median triglyceride = 0.46.

The median is 10% away from the mean.

If the distribution is symmetrical the sample mean and

median will be about the same, but in a skew

distribution they will usually be different.

If the distribution is skew to the right, as for serum

triglyceride, the mean will usually be greater, if it is skew

to the left the median will usually be greater.

This is because the values in the tails affect the mean

but not the median.

Discover Study notes of Mathematical Methods University of York

Partial preview of the text

Download Understanding Means, Medians, Variance, and Standard Deviation in Applied Biostatistics and more Study notes Mathematical Methods in PDF only on Docsity!

Applied Biostatistics

Mean and Standard Deviation

Martin Bland

Professor of Health Statistics University of York

http://www-users.york.ac.uk/~mb55/

The mean

The arithmetic mean or average , usually referred to simply as the mean is found by taking the sum of the observations and dividing by their number.

The mean is often denoted by a little bar over the symbol for the variable, e.g..

The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later.

The median is a very useful descriptive statistic, but not much used for other purposes.

x

Median, mean and skewness:

Mean FEV1 = 4.06. Median FEV1 = 4.1, so the median is within 1% of the mean.

Mean triglyceride = 0.51. Median triglyceride = 0.46. The median is 10% away from the mean.

If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will usually be different.

If the distribution is skew to the right, as for serum triglyceride, the mean will usually be greater, if it is skew to the left the median will usually be greater.

This is because the values in the tails affect the mean but not the median.

Increasing the largest observation will pull the mean higher.

It will not affect the median.

0

20

40

60

80

Frequency

0 .5 1 1.5 2 Triglyceride Median Mean

Variability

The mean and median are measures of the central tendency or position of the middle of the distribution. We shall also need a measure of the spread, dispersion or variability of the distribution.

Variability

For use in the analysis of data, range and IQR are not satisfactory. Instead we use two other measures of variability: variance and standard deviation. These both measure how far observations are from the mean of the distribution. Variance is the average squared difference from the mean. Standard deviation is the square root of the variance.

Standard deviation

FEV1: s = 0.449 = 0.67 litres.

Frequency

2 3 4 5 6

0

5

10

15

20

FEV1 (litre)

x+2s x+s

x x-s

x-2s

Majority of observations within one SD of mean (usually about 2/3). Almost all within about two SD of mean (usually about 95%).

Standard deviation

Triglyceride: s = 0.04802 = 0.22 mmol/litre.

Majority of observations within one SD of mean (usually about 2/3). Almost all within about two SD of mean (usually about 95%), but those outside may be all at one end.

Frequency

Triglyceride

0 .5 1 1.5 2

0

20

40

60

80

x+2s x-s x+s

x-2s x

Standard deviation

Gestational age: s = 5.242 = 2.29 weeks.

Majority of observations within one SD of mean (usually about 2/3). Almost all within about two SD of mean (usually about 95%), but those outside may be all at one end.

x x-s x-2s x+s x+2s

0

100

200

300

400

500

Frequency

20 25 30 35 40 45 Gestational age (weeks)

Spotting skewness

If the mean is less than two standard deviations, two standard deviations below the mean will be negative.

For any variable which cannot be negative, this tells us that the distribution must be positively skew.

If the mean or the median is near to one end of the range or interquartile range, this tells us that the distribution must be skew. If the mean or median is near the lower limit it will be positively skew, if near the upper limit it will be negatively skew.

Spotting skewness

Triglyceride: median = 0.46, mean = 0.51, SD = 0.22, range = 0.15 to 1.66, IQR = 0.35 to 0. mmol/l.

These rules of thumb only work one way, e.g. mean may exceed two SD and distribution may still be skew.

Gestational age: median = 39, mean = 38.95, SD = 2.29, range = 21 to 44, IQR = 38 to 40 weeks.

The Normal distribution

Many statistical methods are only valid if we can assume that our data follow a distribution of a particular type, the Normal distribution. This is a continuous, symmetrical, unimodal distribution described by a mathematical equation, which we shall omit.

0

5

10

15

20

Frequency

2 3 4 5 6 FEV1 (litres)

The parameters (mean and variance) of a Normal distribution happen to be equal to the mean and variance. These two numbers tell us which member of the Normal family we have.

Mean=0, variance= is called the Standard Normal 0 distribution.

.

Relative frequency

density

-5-4-3-2-1 0 1 2 3 4 5 6 7 8 9 10 Normal variable Mn=0, Var=1 Mn=3, Var= Mn=3, Var=

The parameters (mean and variance) of a Normal distribution happen to be equal to the mean and variance. These two numbers tell us which member of the Normal family we have.

The distributions are the same in terms of standard deviations 0 from the mean.

.

Relative frequency

density

-5-4-3-2-1 0 1 2 3 4 5 6 7 8 9 10 Normal variable Mn=0, SD=1 Mn=3, SD= Mn=3, SD=

The Normal distribution is important for two reasons.

Many natural variables follow it quite closely, certainly sufficiently closely for us to use statistical methods which require this.
Even when we have a variable which does not follow a Normal distribution, if we the take the mean of a sample of observations, such means will follow a Normal distribution.

An illustration of the Central Limit Theorem

0

200

400

600

800

Frequency -.2 0 .2 .4 .6 .8 1 1. Uniform variable

Single Uniform variable

0

200

400

600

800

1000

Frequency -.2 0 .2 .4 .6 .8 1 1. Mean of two

Two Uniform variables

0

500

1000

1500

Frequency

-.2 0 .2 .4 .6 .8 1 1. Mean of four

Four Uniform variables

0

500

1000

1500

2000

Frequency

-.2 0 .2 .4 .6 .8 1 1. Mean of ten

Ten Uniform variables

There is no simple formula linking the variable and the area under the curve. Hence we cannot find a formula to calculate the frequency between two chosen values of the variable, nor the value which would be exceeded for a given proportion of observations. Numerical methods for calculating these things with acceptable accuracy were used to produce extensive tables of the Normal distribution. These numerical methods for calculating Normal frequencies have been built into statistical computer programs and computers can estimate them whenever they are needed.

Two numbers from tables of the Normal distribution:

we expect 68% of observations to lie within one standard deviation from the mean,
we expect 95% of observations to lie within 1. standard deviations from the mean. This is true for all Normal distributions, whatever the mean, variance, and standard deviation.

Understanding Means, Medians, Variance, and Standard Deviation in Applied Biostatistics, Study notes of Mathematical Methods

Related documents

Partial preview of the text

Download Understanding Means, Medians, Variance, and Standard Deviation in Applied Biostatistics and more Study notes Mathematical Methods in PDF only on Docsity!

Applied Biostatistics

Martin Bland

The mean

x

Median, mean and skewness:

Variability

Variability

Standard deviation

Standard deviation

Standard deviation

Spotting skewness

Spotting skewness

The Normal distribution