Correlation & Regression Chapter 5, Lecture notes of Statistics

Relationships, cont. There IS a relationship, but its not Linear. R=0.0, but that DOESN'T mean that the two variables are.

Typology: Lecture notes

2022/2023

Uploaded on 02/28/2023

stefan18
stefan18 🇺🇸

4.2

(36)

278 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Correlation & Regression
Chapter 5
Correlation: Do you have a relationship?
Between two Quantitative Variables (measured on Same Person)
(1) If you have a relationship (p<0.05)?
(2) What is the Direction (+ vs. -)?
(3) What is the Strength (r: from –1 to +1)?
Regression: If you have a Significant Correlation:
How well can you Predict a subject’s y-score if you know their
X-score (and vice versa)
Are predictions for members of the Population as good
As predictions for Sample members?
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Correlation & Regression Chapter 5 and more Lecture notes Statistics in PDF only on Docsity!

Correlation & Regression

Chapter 5

Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have a relationship (p<0.05)? (2) What is the Direction (+ vs. -)? (3) What is the Strength (r: from –1 to +1)? Regression: If you have a Significant Correlation: How well can you Predict a subject’s y-score if you know their X-score (and vice versa) Are predictions for members of the Population as good As predictions for Sample members?

Correlations measure LINEAR

Relationships

No Relationship: r=0.0 Y-scores do not have a Tendency to go up or down as X-scores go up You cannot Predict a person’s Y-value if you know his X- Value any better than if you Didn’t know his X-score Positive Linear Relationship: Y-scores tend to go up as X-scores go up

Interpreting r-values

Coefficient of Determination – r 2

Square of r-value r 2

  • 100 = Percent of Shared Variance; the Rest of the variance Is Independent of the other variable r=0. r=0.

Interpreting r-values

If the Coefficient of Determination between height and weight Is r 2 =0.3 (r=0.9): •30% of variability in peoples weight can be Related to their height •70% of the difference between people in their of weight Is Independent of their height •Remember: This does not mean that weight is partially Caused by height Arm and leg length have a high coefficient of Determination but a growing leg does not cause Your arm to grow

Negative Correlation

Quasi-Independent Variable:

of cigarettes/day

Dependent Variable: Physical Endurance The fatter the field, the weaker the correlation r=-0. r=-0. r=-0. r=-0. r=-0. r=-0.50 r=-

Correlations

of Malformed Cells in Lung Biopsy

of Cigarettes Smoked per Day x 10

Methodology: Restriction of

Range

Restriction of Range cases an artificially low (underestimated) value of r. E.G. using just high GRE scores represented by the open circles. Common when using the scores to determine Who is used in the correlational analysis. E.G.: Only applicants with high GRE scores get into Grad School.

Computing r

Raw Scores Deviation Scores

Can You Predict Y

i

If:

You Know X

i

X d i^ **d ix

  • d ix Y d i**^ **d iy
  • d iy d ix
  • d iy** 10 10 100 10 10 100 100 10 10 100 10 10 100 100

100

100 100 - - 100 - - 100 100 X-bar= Y-bar= Y-bar= SUM SUM SUM 400 400 400

Can You Predict Y

i

If:

You Know X

i

X d i^ **d ix

  • d ix Y d i**^ **d iy
  • d iy d ix
  • d iy** 10 10 100 10 10 100 100 10 10 100

100

- - 100 10 10 100 - - - 100 - - 100 100 **X-bar=** **Y-bar=** **Y-bar=** **SUM** **SUM** **SUM** **400** **400** **0**

Regression

Creates a line of “Best Fit” running through the data Uses Method of Least Squares The smallest Squared Distances between the Points and The Line Y-hat = a +bX and y= a +bX-hat a=intercept b=slope The Regression Line (line of best fit) give you a & b Plug in X to predict Y, or Y to predict X

Regression, cont.

Method of Lest Squares: •Minimizes deviations from regression line •Therefore, minimizes Errors of Prediction

Multiple Correlation &

Regression

Using several measures to predict a measure or future measure Y-hat = a + b 1

X

1

  • b 2

X

2

  • b 3

X

3

  • b 4

X

4 •Y-hat is the Dependent Variable •X 1

, X

2

, X

3

, & X

4 are the Predictor (Independent) Variables College GPA-hat = a + b 1 H.S.GPA + b 2 SAT + b 3 ACT + b 4 HoursWork R = Multiple Correlation (Range: -1 - 0 - +1) R = Coefficient of Determination (R*R * 100; 0 - 100%) Uses Partial Correlations for all but the first Predictor Variable

Partial Correlations

The relationship (shared variance) between two variables when the variance which they BOTH share with a third variable is removed Used in multiple regression to subtract Redundant variance when Assessing the Combined relationship between the Predictor Variables And the Dependent Variable. E.G., H.S. GPA and SAT scores.