










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of regression and correlation methods in biostatistics, focusing on linear regression and scatterplots. Linear regression is used to examine the relationship between one dependent variable and one or more independent variables, allowing for prediction capabilities. Scatterplots display the relationship between two continuous variables and help visualize the relationship between them. Topics such as regression line, correlation coefficient, scatterplot, perfect and imperfect relationships, and fitting regression lines.
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Lecture 20
BIL 311
Lecturer: Dr Patricia Buendia
Lecturer: Dr. Patricia Buendia
Chapter 11 – Regression andCorrelation MethodsCorrelation Methods
Regression and Correlation
g
Methods Parameters of interest:
Linear Regression: The Regression Line
Correlation: The Correlation Coefficient
It can be only values chosen by the experimenter
experimenter has no control over what the X values are.
Example
: The scatterplot below shows the data points from a
Example
:^ The scatterplot below shows the data points from a
study in which the estriol levels from pregnant women who arenear term are compared to the infant birthweight.
If x
= estriol level and y=birthweight, then we can postulate a If x
estriol level and y birthweight, then we can postulate a
relationship between y and x that is described by:
E(y|x)=
α
β x
Regression Line
: The line y=
α
βx, where
α
is the intercept
and
β
is the slope of the lineβ
p
linear relationship
is one in which the relationship
between X and Y can best be represented by a straight line.
curvilinear relationship
is one in which the relationship
curvilinear
relationship
is
one in which the relationship
between X and Y can best be represented by a curved line.(nonlinear regression)
A positive relationship
exists
when Y increases as Xincreases (i.e., when the slope
β
is positive). A negative relationship
exists
negative relationship
exists
when Y decreases as Xincreases (i.e., when the slope
β
is negative).
g^
y is the dependant variable and
x the independent variable
2
1
1
2
)
(^
i
n i
i
n i
i^
bx a y
d
S^
− −
=
=^
∑
∑
=
=
Goodness of fit of a regression lineGoodness
of fit of a regression line
More Definitions
Th
t t l
f^
t l SS
i^
Th
e total sum of squares or
otal SS
is
(y
2
The regression sum of squares, or
Reg SS
, is
(^y
⎯y)
2 =bL
xy
2 xy
xx
y^
y
The residual sum of squares, or
Res SS
is
(y
-^yi
(^2) ) (^) i
yy
2 xy
xx
It can be shown that
Total SS
= Reg SS + Res SS
It^
can be shown that
Total
Reg SS + Res SS
(y
⎯y)
(^y
y)
(y
-^yi
(^2) ) (^) i
yy
Reg MS
= Reg SS/k, with k the number of predictor variables in
the model and k=1 for simple linear regressionthe model and k=1 for simple linear regression
Res MS
= Res SS/(n-k-1)
F Test for Simple Linear RegressionF
Test for Simple Linear Regression
To test for a good fit of a linear regression line we have theTo
test for a good fit of a linear regression line we have the hypothesis H
β
=0 versus H
β ≠
0, where
β
is the underlying
slope of the regression line.
= Reg SS / [Res SS/(n-2)] with b=L
xy
xx
which follows an F
1 n 2
Reg SS / [Res SS/(n 2)] with b L
xy
xx
which follows an F
1,n-
distribution under H
If F> F
1,n-2,1-
α^
then reject H
0
If F
then accept H
If F
1,n-2,1-
α^
then accept H
0
The exact p-value is given by Pr( F
1,n-
Results displayed in ANOVA table
SS
df
MS
F^
P-value
Regression
a'=Reg SS
1
a'/
[a’/1]/[b’/n-2] Pr( F
1,n-
F)
Residual
b’=Res SS
n-
b’/n-
Total
a’+b’
Total
a^
b