Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pearson's Correlation Coefficient and Simple Linear Regression - Prof. Salvador Gezan, Study notes of Data Analysis & Statistical Methods

University of Florida (UF)Data Analysis & Statistical Methods

Prof. Salvador Gezan

Pearson's correlation coefficient and its uses in simple linear regression. It covers the concept of correlation, the formula for calculating pearson's correlation coefficient, and its significance. The document also discusses the assumptions and goals of simple linear regression, as well as the estimation and prediction methods. An example using seed weight and length data is provided.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-usr-1 🇺🇸

9 documents

1 / 48

This page cannot be seen from the preview

Don't miss anything!

Chapter11

Si l Li Ri

Si

mp

l

e

Li

near

R

egress

i

on

Discover Study notes of Data Analysis & Statistical Methods University of Florida (UF)

Partial preview of the text

Download Pearson's Correlation Coefficient and Simple Linear Regression - Prof. Salvador Gezan and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter

Si

l^

Li

R^

i

Si

mple

Li

near

R

egression

Correlation

-^ Correlation

is a^

measure

of

the

strength and

direction of

a^

linear

relationship

between

two

variables,

here

labeled

x^

and

y.

-^ In

general

statistical

usage,

correlation

refers

to

the

amount

of

departure

of

the

two

variables

from

independence.

-^ The

parameter

ρyx

is the

population

correlation

coefficient

and

is

sometimes

called

Pearson’s

product

‐moment

correlation

coefficient

.

-^ Sample

correlation

coefficient:

xy

i i

i i^

SS

n y x

y x

r^

−

(^

)^

(^

)^

yy xx

y

i

yx

SS SS

n y y n x x

r^

=

−

×

−

=

2

-^ Note

that

high

correlation

does

not

necessarily

imply

causation.

Correlation

Testing

Correlation

Simple

Linear

Regression

-^ Simplest

linear

regression

model:

(^

2

β β

y is

response

(or dependent) variable

~^

2

1 0

σ

ε ε

β β^

N

x

y^

-^ y

is^

response

(or

dependent)

variable

-^ x

is^

explanatory

(or

independent)

variable

-^ β

and 0

β^1

are

the

parameters

to

be

estimated

-^ ε

is random deviation (‘error’ or ‘residual’)

-^ ε

is^

random

deviation

( error

or

residual )

caused

by: ‐^ uncontrolled

factors,

β^1

> 0

⇒

Positive Association

‐^ measurement

errors,

‐^ missing

variables

in

the

model,

rounding of numbers etc

β^1

>^ 0

⇒

Positive

Association

β^1

<^ 0

⇒

Negative

Association

β^1

=^ 0

⇒

No

Association

‐^ rounding

of

numbers

,^ etc

.

β^1

Simple

Linear

Regression

●^ Regression allows

us

to

estimate

the

most

probable

value

of

y^ for

a^ given

value

of^

x. ●^ Expressed statistically as:

(read as

“the expectation of y given x”

)

) | (^

x y E

●^ Expressed

statistically

as:

(read

as

the

expectation

of

y^ given

x^

)

●^ For

the

simple

linear

model

we

have:

●^ y

is^

assumed

to

follow

a^ Normal

distribution

and,

if^ we

have

no

information

on

x

x y E^

1 0 ) | (

β β^

=

) | (^

x y E

x ,^ our

best

estimate

of

y^ corresponds

to

the

mean

of

y ,^

or^

.

(^ ) yE

y y^

= ˆ=

●^ However,

if^ we

have

some

additional

information from a

correlated variable

information

from

a^ correlated

variable

x ,^ then

we

can

improve

our

estimate,

and

we

use: E^

) | (

β β

for

which

the

distributions

are

much

narrower

x

x y E^

1 0 ) | (

β β^

=

narrower

.

Summary Calculations

Fitting

the

Model:

LS

Summary

Calculations

∑ ∑

−

=^

2

)

)( (

) (

SS

x x

SS

xx

∑ ∑

∑

SS

(^

)^

n x

x

SS

xx^

2

∑

∑ ∑

−

=

−

=

(^2) )

(

)

)( (

y y

SS

y y x x

SS

xy^ yy

∑ ∑

∑

=^

n y x

xy

SS

xy^

(^

)^

n y

y

SS

yy^

2

∑

Parameter

Estimates

(^

2

xx

xy

yy^

SS

SSE

SS SS

x x

y y x x

xy xx

2

^^1

β^

∑ ∑

x

y^

^^1

^^0

β

β^

y y n ∑^

)

(^

2 ^

MSE

SSEn

n

y y

s^

i

i i

= − =

− −

=

∑=

2

)

( 1

2

Example

Several

morphological

traits

from

190

seeds

obtained

from

a^

line

of

diploid

wheat

Triticum monococcum

were

measured

automatically

with

a^

Single

‐Kernel

Characterization

System.

The

variables

recorded

g^

y

were

diameter

,^ length

,^ weight

,^ moisture

content

and

hardness

of

each

seed.

50 45 40 35 30 Weight

Response (

y )?

25 20

Response

( y

)?

Predictor

( x

)?

275 250 15

300

400 375 350 (^325) Length

50

Example

50 45 40 35

i i

i^

length

weight

ε

β β^

=^

1 0

i i

i^

x

y

ε β β^

=^

1 0

35 30 25 20 Weight

(^190) = n^

(^2) = p

(^08). 626 ∑^

= x i

5445 ∑^

= i k^ = 2 y

275 20

250 15

300

400 375 350 (^325) Length

(^30). 2082 2 ∑^

= i x^

163336 2 ∑^

= i y^03. 18273

∑^

= i yxi

268 19 190 08 626 30 2082

2

.

/ .

.

n )x ( x

SS

i

xx^

=

−

=

−

=^ ∑

∑

763 7293 190 5445

163336

2

.

/

n )y ( y

SS

i

yy^

=

−

=

−

=^ ∑

∑

(^

)^

895 330 (^190) / 5445 08 626 03 18273

=

×

−

=

−

=^ ∑

∑^

n y x yx

SS

i i ii

(^

)^

(^895). 330 (^190) / 5445 (^08). 626 (^03). 18273

=

×

=

=^ ∑

∑^

n y x yx

SS

i i ii

xy

173 17 268 19

895 330

.

..

SSSS ˆ

xy^ xx

=

= =β 1

(^

)^

(^

)^

(^931). 27 (^08). 626 (^173). 17 5445 (^1190)

ˆ

1 ˆ^

−=

×

−

= β^ −^1

=^

∑^

i i^

x y n β^0

Example

proc

gplot

data=Seeds;

plot

weight*length; run;proc

reg

data=Seeds; model

weight

=^

length;

output

out=resdata

p=pred

student=studres;

run;proc

gplot

data=resdata;

plot

studres*pred/vref

0;

plot

studres*pred/vref=

0;

run;proc

univariate

data=resdata

noprint;

var

studres; probplot

studres

/normal(mu=est

sigma=est);

p^

/^

(^

g^

);

histogram

stures

/normal;

run;

Example

Fitted and observed relationship with 95% confidence limits 50

i

i^

x

y^

ˆ^

45 4040 35 ght

30 25 weig

20 15

2.^

length

Model

Assumptions

Assumption 1

0 ) (^

= i E

ε

i i

i^

x

y

ε

β β^

=^

1 0

( i^ = 1…

n )

p The expected mean of the residuals,

ε, is assumed to be zero. i^

Assumption 2

The variance of any residual is equal to a constant value common to all residuals

) (^ i

2 ) (

σ ε^

= i Var

y^

q

(homoscedasticity/homogeneity of variances). Assumption 3

The residuals are independent.

0 ) , (^

= i i Cov

ε ε

p

Assumption 4

x^ are nonstochastic i^

The explanatory variable

x^ is measured without error.

Assumption 5

Each response and its corresponding residual are independent of each other. Assumption 6

ε~ i^

N (0,

(^2) σ 0 ) ) , (^

= i yi Cov

ε

p^

i^

( ,

)

The residuals follow a Normal distribution with mean 0 and variance

(^2) σ .

Making

Inferences

about

the

Slope

•^

The

assumptions

described

earlier

produce

a^

normal

sampling

distribution for

the

slope

estimate: S

N

ˆ^

1 1

1

β σ β

β

SSE 2

S^ xx

s

s^

1 1

ˆ ˆ^

β β σ

A d

fid

I t

l

β^

i

M

SE

SSEn s

where

= − =^

2

•^

And

a^

‐α

confidence

Interval

on

β^1

is: s

t

s t^

(^2) /

^^1

(^2) /

^^1

^

α

β

β^

±

and

t α

/^ is^

based

on

( n

degrees

of

freedom.

xx SS (^2) / 1

(^2) / 1

1

α

β α

β

α/

Making

Inferences

about

the

Slope

-^

2 ‐

Sided

Test

-^

1 ‐

sided

Test

H^0

:^ β

H a

:^ β

H^0

:^ β

H a

:^ β

(or

β^1

.^

^^1

^^1 o

SS

s s t S T^

β

.^

^^1

^^1 o

SS

s s t S T^

value

, (^2) / ˆ^1

n

o

xx t t P

p

t t R R

SS

s s

≥^

− β α

value

, ˆ^1

n

o

xx t t P

p

t t R R

SS

s s

≥^

− β α

value

to t P

p^

value

to t P

p^

Pearson's Correlation Coefficient and Simple Linear Regression - Prof. Salvador Gezan, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Pearson's Correlation Coefficient and Simple Linear Regression - Prof. Salvador Gezan and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter

Si

l^

Li

R^

i

Si

mple

Li

near

R

egression

Correlation

(^

)^

(^

)^

(^

~^

N

x

y^

(^ ) yE

SS

=^

SS

SS

(^

SS

SS

SS

SSE

SS SS

ˆ^

•^

estimate: S

N

ˆ^

S^ xx

s

s^

β^

•^

β^1

±

H^0

:^ β

:^ β

H^0

:^ β

:^ β

β^1

.^

SS

.^

SS

SS

≥^

SS

≥^