Applied Biostatistics, Lecture Notes - Mathematics - 30, Study notes of Mathematical Methods

Correlations and regressions, correlation coefficient, simple linear regression

Typology: Study notes

2010/2011

Uploaded on 09/10/2011

myohmy
myohmy ๐Ÿ‡ฌ๐Ÿ‡ง

4.8

(10)

297 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Applied Biostatistics
Correlation and regression
Martin Bland
Professor of Health Statistics
University of York
http://www-users.york.ac.uk/~mb55/msc/
Correlation
Example: Muscle strength and height in 42 alcoholics
A scatter diagram:
How close is the relationship?
Correlation: measures closeness to a linear relationship.
100
200
300
400
500
Quadriceps stren gth (N)
155 160 165 170 175 180
Height (cm)
Correlation coefficient
Subtract means from observations and multiply.
Sum of products about the means.
Like the sum of squares about the means used for
measuring variability.
Mean
strength
Mean height
100
200
300
400
500
Muscle strength (newtons)
155 160 165 170 175 180
Height (cm)
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Applied Biostatistics, Lecture Notes - Mathematics - 30 and more Study notes Mathematical Methods in PDF only on Docsity!

Applied Biostatistics

Correlation and regression

Martin Bland

Professor of Health Statistics University of York http://www-users.york.ac.uk/~mb55/msc/

Correlation

Example: Muscle strength and height in 42 alcoholics

A scatter diagram:

How close is the relationship? Correlation: measures closeness to a linear relationship.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Subtract means from observations and multiply.

Sum of products about the means. Like the sum of squares about the means used for measuring variability.

Mean strength

Mean height

100

200

300

400

500

Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Subtract means from observations and multiply.

Products in top right and bottom left quadrants positive.

Mean strength

Mean height +ve

+ve 100

200

300

400

500

Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Subtract means from observations and multiply.

Products in top right and bottom left quadrants positive. Products in top left and bottom right quadrants negative.

Mean strength

Mean height +ve

+ve

-ve

-ve 100

200

300

400

500

Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Subtract means from observations and multiply.

Sum of products positive. Correlation positive.

Mean strength

Mean height +ve

+ve

-ve

-ve 100

200

300

400

500

Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Divide sum of products by square roots of sums of squares. Correlation coefficient, denoted by r. Maximum value = 1.00. Minimum value = โ€“1.00.

r = 0.42. Positive correlation of fairly low strength

100

200

300

400

500

Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

Divide sum of products by square roots of sums of squares. Correlation coefficient, denoted by r. Maximum value = 1.00. Minimum value = โ€“1.00.

r = โ€“0.42. Negative correlation of fairly low strength.

100

200

300

400

500

600

Muscle strength (newtons) 20 30 40 50 60 70 Age (years)

Correlation coefficient

Positive when large values of one variable are associated with large values of the other.

Variable Y

Variable X

r = 0.

Correlation coefficient

Positive when large values of one variable are associated with large values of the other.

Variable Y

Variable X

r = 0.

Correlation coefficient

Negative when large values of one variable are associated with small values of the other.

Variable Y

Variable X

r = -0.

Correlation coefficient

Negative when large values of one variable are associated with small values of the other.

Variable Y

Variable X

r = -0.

Correlation coefficient

r = 0.00 when there is no linear relationship.

Variable Y

Variable X

r = 0.

Correlation coefficient

It is possible for r to be equal to 0.00 when there is a relationship which is not linear.

Variable Y

Variable X

r = 0.

Correlation coefficient

We can test the null hypothesis that the correlation coefficient in the population is zero. Simple t test, tabulated. Assume: independent observations, one of the variables is from a Normal distribution. Large deviations from assumption  P very unreliable.

r = 0.42, P = 0.006. Easy to do, simple tables. Computer programmes almost (^100) always print this.

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Correlation coefficient

We can find a confidence interval for the correlation coefficient in the population. Fisherโ€™s z transformation. Assume: independent observations, both of the variables are from a Normal distribution. Large deviations from assumption  CI very unreliable.

r = 0.42, approximate 95% confidence interval: 0.13 to 0. Tricky, approximate. Computer programmes rarely print this.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Regression

Example: Muscle strength and height in 42 alcoholics

What is the relationship? Regression: predict strength from observed height.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Simple Linear Regression

Example: Muscle strength and height in 42 alcoholics What is the relationship? Regression: predict strength from observed height. What is the mean strength for people with any given observed height? Strength is the outcome , dependent , y , or left hand side variable. Height is the predictor , explanatory , independent , x , or right hand side variable. Linear relationship: strength = intercept + slope ร— height Equation of a straight line.

Simple Linear Regression

Which straight line should we choose?

Minimise the sum of the squares of these differences. Principle of least squares, least squares line or equation.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Simple Linear Regression

Strength = โ€“908 + 7.20 ร— height

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Simple Linear Regression

Strength = โ€“908 + 7.20 ร— height

The intercept and slope are call coefficients. The slope of the line is sometimes called the regression coefficient.

These coefficients have units, unlike the correlation coefficient. They can take any value.

Strength in newtons = โ€“908 newtons + 7.20 newtons per cm ร— height in cm

We can find confidence intervals and P values for the coefficients subject to assumptions.

Simple Linear Regression

We can find confidence intervals and P values for the coefficients subject to assumptions. Deviations from line should have a Normal distribution with uniform variance.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Simple Linear Regression

We can find confidence intervals and P values for the coefficients subject to assumptions. Slope = 7.20, 95% CI = 2.15 to 12.25 newtons/cm, P=0.006 against zero. Intercept = โ€“908, 95% CI = โ€“45 to โ€“1771 newtons.

100

200

300

400

500

Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)

Simple Linear Regression

Assumptions: independent observations, deviations from line should have a Normal distribution with uniform variance. Calculate the deviations or residuals, observed minus predicted. Check Normal distribution: Check uniform variance:

0

5

10

15

Frequency

Strength residual^ -200^0

0

100

200

Residual strength

-200Inverse Normal 0 200

0

100

200

Strength residual

155 160 165 170 175 180 height