Correlation and Regression Analysis in Public Health: A Comprehensive Guide, Lecture notes of Biostatistics

correlation and regression lecture notes

Typology: Lecture notes

2018/2019

Uploaded on 05/18/2019

hageexam
hageexam 🇪🇹

1 document

1 / 67

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Correlation and Linear Regression
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43

Partial preview of the text

Download Correlation and Regression Analysis in Public Health: A Comprehensive Guide and more Lecture notes Biostatistics in PDF only on Docsity!

Correlation and Linear Regression

Regression and Correlation

  • Many medical investigations are concerned with:
    • Establishment of relationship between two variables.
    • The strength of a relationship.
    • Predicting one variable on the basis of another.
    • – Controlling the effect of unwanted variables.Controlling the effect of unwanted variables.
  • Such intentions can be addressed either by using correlation

or regression analysis.

Correlation Analysis

  • Does not imply cause and effect relationship.
  • The value of r ranges from -1 to +1.
  • If the correlation coefficient is greater than 0, the variables are said to be positively correlated (i.e. as X increases, Y tends to increase).increase).
  • If the correlation coefficient is less than 0, the variables are said to be negatively correlated (i.e. as X increases, Y tends to decrease).
  • If the correlation coefficient is 0 then the variables are said to be uncorrelated.

Correlation Analysis Cont…

  • The formula for computing sample correlation coefficient (r) for two variables X and Y is given as:
  • Or
[ ( ) ][ ( ) ]
x x^2 y y^2
x x y y
r
  • Or
  • Before computing r, scattered plot between the two variables should be drawn. Why?

[  ( x  x ) ][  ( y  y ) ]

[ n( x^2 ) ( x)^2 ][n( y^2 ) ( y)^2 ]

n xy x y

r

Correlation Analysis Cont…

y y

Strong relationships Weak relationships

(continued)

y

x

x

y

x

x

Correlation Analysis Cont…

y

No relationship

(continued)

y

x

x

Correlation Analysis Cont…

Example 8.1:

  • The data of a random sample of 20 countries are shown in the following table. X represents the percentage of children immunized by age one year and Y represents the under five year mortality rate.year mortality rate.
  • Determine the strength of association between the two variables.

Correlation Analysis Cont…

  • Country % Immunized (X) CMR/1000LB(Y) XY Y^2 X
  • Bolivia
  • Brazil
  • Cambodia
  • Canada
  • China
  • Czech
  • Egypt
  • Ethiopia
  • Ethiopia
  • Finland
  • France
  • Greece
  • India
  • Italy
  • Japan
  • Mexico
  • Poland
  • Russia
  • Senegal
  • Turkey
  • UK
  • Total

Correlation Analysis Cont…

  • Interpretation option:
    • Rule of thumb: Size of Coefficient General Interpretation

0.8-1.0 Very strong relationship

0.6-0.8 Strong relationship

0.4-0.6 Moderate relationship

0.2-0.4 Weak relationship

0.0-0.2 Very weak or no relationship

Correlation Analysis Cont…

  • Hypothesis Testing for a Correlation Coefficient
  • As that of mean and percentage, it is also possible to test significance about population correlation.
  • For two tailed test
    • H 0 : is 0
    • H : is different from 0

  • H 1 : (^) is different from 0
  • The t test statistic is given as (with n-2 df):

2 1

2

r

n t r 

 

Correlation Analysis Cont..

  • The critical t value for 0.05 level of significance at 18 (n-2) degree of freedom is - 1.734. Then we calculate the test statistics.

) 5.

18 ) 0. 79 ( 1 ( 0. 79 )

20 2

  1. 79 ( 1

2 (^2)   2   

  

  r

n t r

  • Hence we accept the H 1 that r indicates significant negative relationship between immunization coverage and child mortality.

1  r 1 ( 0. 79 )^ 0.

Correlation Analysis Cont..

Limitations:

  • Applied only to a linear relationship.
  • One must not extrapolate an observed correlation beyond observed ranges of the x and y value.
  • Does not differentiate dependent and independent variables.
  • • Confounding by a third variable.Confounding by a third variable.

Correlation Analysis Cont..

  • The formula for the Spearman Correlation Coefficient is (given that there is no tied rank):
  • Where;

( 1 )

6 ( ) (^1 )

2

n n

D r (^) s

  • Where;
    • 6 is a constant,
    • D is the difference between a subjects ranks on the two variables,
    • n is the number of subjects.
  • Consider the following example.

Correlation Analysis Cont..

Countries (Per100,00MMR 0LB)

MMRRank

Delivery CoverageService (%)

Rank D D^2

1 315 4 55 6 -2 4 2 450 6 40 5 1 1 3 200 1 70 8 -7 49 4 250 3 79 10 -7 49

The following table presents the MMR level and delivery service coverage in 10 developing countries.

6 (  D 2 )^4 250 3 79 10 -7^49

5 243 2 75 9 -7 49 6 830 9 25 3 6 36 7 850 10 21 2 8 64 8 656 7 20 1 6 36 9 701 8 30 4 4 16 10 410 5 60 7 -2 4 308

= 1- [(6x308)/10(100-1)] = 1-[1848/990] = 1-1. = -0.

2

n n

D

rs