






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Correlations and regressions, correlation coefficient, simple linear regression
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Applied Biostatistics
Professor of Health Statistics University of York http://www-users.york.ac.uk/~mb55/msc/
Example: Muscle strength and height in 42 alcoholics
A scatter diagram:
How close is the relationship? Correlation: measures closeness to a linear relationship.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Subtract means from observations and multiply.
Sum of products about the means. Like the sum of squares about the means used for measuring variability.
Mean strength
Mean height
100
200
300
400
500
Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)
Subtract means from observations and multiply.
Products in top right and bottom left quadrants positive.
Mean strength
Mean height +ve
+ve 100
200
300
400
500
Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)
Subtract means from observations and multiply.
Products in top right and bottom left quadrants positive. Products in top left and bottom right quadrants negative.
Mean strength
Mean height +ve
+ve
-ve
-ve 100
200
300
400
500
Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)
Subtract means from observations and multiply.
Sum of products positive. Correlation positive.
Mean strength
Mean height +ve
+ve
-ve
-ve 100
200
300
400
500
Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)
Divide sum of products by square roots of sums of squares. Correlation coefficient, denoted by r. Maximum value = 1.00. Minimum value = โ1.00.
r = 0.42. Positive correlation of fairly low strength
100
200
300
400
500
Muscle strength (newtons) 155 160 165 170 175 180 Height (cm)
Divide sum of products by square roots of sums of squares. Correlation coefficient, denoted by r. Maximum value = 1.00. Minimum value = โ1.00.
r = โ0.42. Negative correlation of fairly low strength.
100
200
300
400
500
600
Muscle strength (newtons) 20 30 40 50 60 70 Age (years)
Positive when large values of one variable are associated with large values of the other.
Variable Y
Variable X
r = 0.
Positive when large values of one variable are associated with large values of the other.
Variable Y
Variable X
Negative when large values of one variable are associated with small values of the other.
Variable Y
Variable X
Negative when large values of one variable are associated with small values of the other.
Variable Y
Variable X
r = 0.00 when there is no linear relationship.
Variable Y
Variable X
It is possible for r to be equal to 0.00 when there is a relationship which is not linear.
Variable Y
Variable X
We can test the null hypothesis that the correlation coefficient in the population is zero. Simple t test, tabulated. Assume: independent observations, one of the variables is from a Normal distribution. Large deviations from assumption P very unreliable.
r = 0.42, P = 0.006. Easy to do, simple tables. Computer programmes almost (^100) always print this.
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
We can find a confidence interval for the correlation coefficient in the population. Fisherโs z transformation. Assume: independent observations, both of the variables are from a Normal distribution. Large deviations from assumption CI very unreliable.
r = 0.42, approximate 95% confidence interval: 0.13 to 0. Tricky, approximate. Computer programmes rarely print this.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Example: Muscle strength and height in 42 alcoholics
What is the relationship? Regression: predict strength from observed height.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Example: Muscle strength and height in 42 alcoholics What is the relationship? Regression: predict strength from observed height. What is the mean strength for people with any given observed height? Strength is the outcome , dependent , y , or left hand side variable. Height is the predictor , explanatory , independent , x , or right hand side variable. Linear relationship: strength = intercept + slope ร height Equation of a straight line.
Which straight line should we choose?
Minimise the sum of the squares of these differences. Principle of least squares, least squares line or equation.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Strength = โ908 + 7.20 ร height
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Strength = โ908 + 7.20 ร height
The intercept and slope are call coefficients. The slope of the line is sometimes called the regression coefficient.
These coefficients have units, unlike the correlation coefficient. They can take any value.
Strength in newtons = โ908 newtons + 7.20 newtons per cm ร height in cm
We can find confidence intervals and P values for the coefficients subject to assumptions.
We can find confidence intervals and P values for the coefficients subject to assumptions. Deviations from line should have a Normal distribution with uniform variance.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
We can find confidence intervals and P values for the coefficients subject to assumptions. Slope = 7.20, 95% CI = 2.15 to 12.25 newtons/cm, P=0.006 against zero. Intercept = โ908, 95% CI = โ45 to โ1771 newtons.
100
200
300
400
500
Quadriceps strength (N) 155 160 165 170 175 180 Height (cm)
Assumptions: independent observations, deviations from line should have a Normal distribution with uniform variance. Calculate the deviations or residuals, observed minus predicted. Check Normal distribution: Check uniform variance:
0
5
10
15
Frequency
Strength residual^ -200^0
0
100
200
Residual strength
-200Inverse Normal 0 200
0
100
200
Strength residual
155 160 165 170 175 180 height