Regression and Correlation Methods in Biostatistics: Linear Regression and Scatterplots, Study notes of Biostatistics

An overview of regression and correlation methods in biostatistics, focusing on linear regression and scatterplots. Linear regression is used to examine the relationship between one dependent variable and one or more independent variables, allowing for prediction capabilities. Scatterplots display the relationship between two continuous variables and help visualize the relationship between them. Topics such as regression line, correlation coefficient, scatterplot, perfect and imperfect relationships, and fitting regression lines.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-fm5-1
koofers-user-fm5-1 🇺🇸

10 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Bi
ostat
i
st
i
cs
ostatstcs
Lecture 20
BIL 311
Lecturer: Dr Patricia Buendia
Lecturer:
Dr
.
Patricia
Buendia
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Regression and Correlation Methods in Biostatistics: Linear Regression and Scatterplots and more Study notes Biostatistics in PDF only on Docsity!

Biostatistics

ostat st cs

Lecture 20

BIL 311

Lecturer: Dr Patricia Buendia

Lecturer: Dr. Patricia Buendia

Lecture 20 Outline „

Chapter 11 – Regression andCorrelation MethodsCorrelation Methods

Regression and Correlation

g

Methods Parameters of interest: „

Linear Regression: The Regression Line

Correlation: The Correlation Coefficient

Linear Regression

Li

i^

i^

th

fitti

f^

f^

ti

t^

t^

f

„

Li

near regression is the fitting of a function to a set of

observations

„

Usually there are two continuous variables (more are possible) divided into two types, experimentalvariables and a response variable

„

The Y variable is the observed response to the XThe

Y variable is the observed response to the X

values, so they are naturally paired data points

„

The X variable can be of two sorts:

It can be only values chosen by the experimenter

  • It can be only values chosen by the experimenter– It can be observed, just like the Y values, so that the

experimenter has no control over what the X values are.

Scatterplot „^

Example

: The scatterplot below shows the data points from a

Example

:^ The scatterplot below shows the data points from a

study in which the estriol levels from pregnant women who arenear term are compared to the infant birthweight. „

If x

= estriol level and y=birthweight, then we can postulate a If x

estriol level and y birthweight, then we can postulate a

relationship between y and x that is described by:

E(y|x)=

α

β x

Linear Regression „

Definitions:–

Regression Line

: The line y=

α

βx, where

α

is the intercept

and

β

is the slope of the lineβ

p

– A

linear relationship

is one in which the relationship

between X and Y can best be represented by a straight line.

  • A

curvilinear relationship

is one in which the relationship

A

curvilinear

relationship

is

one in which the relationship

between X and Y can best be represented by a curved line.(nonlinear regression)

Linear Regression „

More Definitions:–

A positive relationship

exists

when Y increases as Xincreases (i.e., when the slope

β

is positive). A negative relationship

exists

–^
A

negative relationship

exists

when Y decreases as Xincreases (i.e., when the slope

β

is negative).

g^

In the birthweight(x) and estriol(y) example we have

a positive relationship

a positive relationship

y is the dependant variable and

x the independent variable

Fitting Regression Lines „

The best fitting regression line is obtained byapplying the least squares criterion

„

The least squares line or estimated regression line

„

The least squares line, or estimated regression line,is the line y=a+bx that minimizes the sum of squareddistances of the sample points (S) from the givenline S is given by:line. S is given by:

2

1

1

2

)

(^

i

n i

i

n i

i^

bx a y

d

S^

− −

=

=^

=

=

Linear Regression „

Example: What is the estimated average birthweightif a pregnant woman has an estriol level of 15 mg/24hr?hr?

„

The estimated value y is denoted by ^y=a+bx

„

In our example we have

^y= 21.52+0.608(15)=30.

In our example we have

y

„

One possible use of the regression line for pregnantwomen is to identify women who are carrying a lowbi th

i ht f t

If

h

b

id

tifi d

birthweight fetus. If such women can be identifiedprior to conception, then drugs might be used toprolong the pregnancy until the fetus grows larger.

p

g

p

g

y

g

g

Goodness of fit of a regression lineGoodness

of fit of a regression line

More Definitions

Th

t t l

f^

T

t l SS

i^

(^
⎯^

„

Th

e total sum of squares or

T

otal SS

is

(y

  • y)i

2

„

The regression sum of squares, or

Reg SS

, is

(^y

  • i^

⎯y)

2 =bL

xy

=L

2 xy

/L

xx

y^

y

„

The residual sum of squares, or

Res SS

is

(y

-^yi

(^2) ) (^) i

=L

yy

- L

2 xy

/L

xx

„

It can be shown that

Total SS

= Reg SS + Res SS

„

It^

can be shown that

Total

SS

Reg SS + Res SS

(y

  • i^

⎯y)

(^y

  • ⎯i

y)

(y

-^yi

(^2) ) (^) i

= L

yy

„

Reg MS

= Reg SS/k, with k the number of predictor variables in

the model and k=1 for simple linear regressionthe model and k=1 for simple linear regression „

Res MS

= Res SS/(n-k-1)

F Test for Simple Linear RegressionF

Test for Simple Linear Regression „

To test for a good fit of a linear regression line we have theTo

test for a good fit of a linear regression line we have the hypothesis H

β

=0 versus H

β ≠

0, where

β

is the underlying

slope of the regression line. „

F

= Reg SS / [Res SS/(n-2)] with b=L

xy

/L

xx

which follows an F

1 n 2

F

Reg SS / [Res SS/(n 2)] with b L

xy

/L

xx

which follows an F

1,n-

distribution under H

„

If F> F

1,n-2,1-

α^

then reject H

0

„

If F

≤^
F

then accept H

„

If F

≤^
F

1,n-2,1-

α^

then accept H

0

„

The exact p-value is given by Pr( F

1,n-

>F)

„

Results displayed in ANOVA table

SS

df

MS

F^

P-value

Regression

a'=Reg SS

1

a'/

[a’/1]/[b’/n-2] Pr( F

1,n-

F)

Residual

b’=Res SS

n-

b’/n-

Total

a’+b’

Total

a^

b