Download Pearson's Correlation: Analyzing Weight, Height, and Age and more Summaries Statistics for Psychologists in PDF only on Docsity!
Bivariate Analysis
Variable 1
Variable 2
2 LEVELS >2 LEVELS CONTINUOUS
2 LEVELS X 2
chi square test
X 2
chi square test
t-test
>2 LEVELS X 2
chi square test
X 2
chi square test
ANOVA
(F-test)
CONTINUOUS t-test ANOVA
(F-test)
-Correlation
-Simple linear
Regression
Correlation
Used when you measure two continuous variables.
Examples: Association between weight & height.
Association between age & blood pressure
Weight (Kg) Height (cm)
Weight
H e i g h t
Correlation
Correlation is measured by Pearson's Correlation
Coefficient.
A measure of the linear association between two
variables that have been measured on a
continuous scale.
Pearson's correlation coefficient is denoted by r.
A correlation coefficient is a number ranges
between -1 and +1.
Pearson's Correlation Coefficient
If r = 1 Î perfect positive linear relationship
between the two variables.
If r = -1 Î perfect negative linear relationship
between the two variables.
If r = 0 Î No linear relationship between the two
variables.
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
r= +1 r= -1 r= 0
Pearson's Correlation Coefficient
http://noppa5.pc.helsinki.fi/koe/corr/cor7.html
Pearson's Correlation Coefficient
Pearson's Correlation Coefficient
Moderate Moderate
Strong Weak Strong
Research question: Is there a linear relationship between
the weight and height of students?
Pearson's Correlation Coefficient
Example 1:
Ho : there is no linear relationship between weight &
height of students in the population ( p = 0)
Ha: there is a linear relationship between weight &
height of students in the population ( p ≠ 0)
Statistical test : Pearson correlation coefficient (R)
Correlations
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
weight
height
weight height
Correlation is significant at the 0.01 level
(2 il d)
Pearson's Correlation Coefficient
Example 1: SPSS Output
P-Value
r
coefficient
Correlations
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
weight
height
weight height
Correlation is significant at the 0.01 level
(2 il d)
Pearson's Correlation Coefficient
Example 1: SPSS Output
Value of statistical test:
P-value:
Pearson's Correlation Coefficient
Example 3: SPSS Output
Correlations
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
age
height
age height
Correlation is significant at the 0.01 level (2 t il d)
p = 0 ; No linear relationship between height & age
in the population
Ho :
Ha: p^ ≠^ 0 ; There is linear relationship between
height & age in the population
Pearson's Correlation Coefficient
Example 3: SPSS Output
Correlations
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
age
height
age height
Correlation is significant at the 0.01 level (2 t il d)
Value of statistical test:
P-value:
Pearson's Correlation Coefficient
Example 3: SPSS Output
Correlations
Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N
age
height
age height
Correlation is significant at the 0.01 level (2 t il d)
Conclusion: At significance level of 0.05, we reject null
hypothesis and conclude that in the population there is a
significant linear relationship between the height and age of
students.
SPSS command for r
Example 1
Analyze
Correlate
Bivariate
select height and weight and put it in the
“variables” box.
In-class questions
T (True) or F (False):
In studying whether there is an association
between gender and weight, the investigator
found out that r= 0.90 and p-value<0.001 and
concludes that there is a strong significant
correlation between gender and weight.
In-class questions
T (True) or F (False):
The correlation between obesity and number of
cigarettes smoked was r=0.012 and the p-value=
0.856. Based on these results we conclude that
there isn’t any association between obesity and
number of cigarette smoked.
Simple Linear Regression
Used to explain observed variation in the data
For example, we measure blood pressure in a sample of
patients and observe:
I=Pt# 1 2 3 4 5 6 7
Y= BP 85 105 90 85 110 70 115
Simple Linear Regression
In order to explain why BP of individual patients are
different, we try to associate the differences in PB with
differences in other relevant patient characteristics
(variables).
Example: Can variation in blood pressure be explained by
age?
Questions:
1) What is the most appropriate
mathematical Model to use?
A straight line, parabola,
etc…
2) Given a specific model, how
do we determine the best
fitting model?
Simple Linear Regression
Y= B 0 + B 1 X
Y = dependent variable
X = independent variable
B 0 = Y intercept
B 1 = Slope
The intercept B 0 is the value of Y when X=0.
The slope B 1 is the amount of change in Y for each 1-unit
change in X.
Simple Linear Regression
Mathematical properties of a straight line
Optimal Regression line = B 0 + B 1 X
Y = B 0 + B 1 X
Simple Linear Regression
Estimation of a simple Linear Regression Model
Research Question: Does height help to predict weight
using a straight line model? Is there a linear relationship
between weight and height? Does height explain a
significant portion of the variation in the values of weight
observed?
Weight = B 0 + B 1 Height
Simple Linear Regression
Example 1:
In-class questions
Question 1:
In a simple linear regression model the predicted straight line
was as follows:
Interpret the value of R^2
Number of weekly hours of PA explain 22% of the variation
observed in weight
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA)
R 2 = 0.22; p-value for the slope= 0.
In-class questions
Question 1:
In a simple linear regression model the predicted straight line
was as follows:
What is the null hypothesis? Alternative?
H 0 : B weekly hours of PA =
Ha: B weekly hours of PA≠ 0
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA)
R 2 = 0.22; p-value for the slope= 0.
In-class questions
Question 1:
In a simple linear regression model the predicted straight line
was as follows:
Is the association between weight & weekly hours of PA positive
or negative?
Negative
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA)
R 2 = 0.22; p-value for the slope= 0.
In-class questions
Question 1:
In a simple linear regression model the predicted straight line
was as follows:
What is the magnitude of this association?
1.32 => One hour increase of PA in a week decreases
weight by 1.32 Kg.
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA)
R 2 = 0.22; p-value for the slope= 0.
In-class questions
Question 1:
In a simple linear regression model the predicted straight line
was as follows:
Is the association significant at a level of 0.05?
Because the p-value of the B 1 is < 0.05; then reject H 0 and
conclude that weekly hours of PA provide significant
information for predicting weight.
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA)
R 2 = 0.22; p-value for the slope= 0.
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity measure
Coefficientsa
.443 .747 .593. .661 .066 .407 9.945.
(Constant) ISS - injury severity measure
Model 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
What is the dependent/ independent variable?
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity m
Coefficientsa
(Constant) ISS - injury severity meas
Mode 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
Dependent variable: Length of hospital stay
Independent Variable: ISS- Injury severity score
Interpret the value of R^2
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity m
Coefficientsa
(Constant) ISS - injury severity meas
Mode 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
ISS explains 40.7% of the variation observed in length of
hospital stay.
What is the null hypothesis? Alternative?
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity m
Coefficientsa
(Constant) ISS - injury severity meas
Mode 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
H 0 : BISS =
Ha: B ISS≠ 0
Is there a significant association between the dependent & the
independent?
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity m
Coefficientsa
(Constant) ISS - injury severity meas
Mode 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
Because the p-value of the B ISS is < 0.05; then reject H 0 and
conclude that ISS provide significant information for predicting
length of hospital stay.
What is the magnitude of this association?
Model Summary
.407a^ .166 .164 10.
Model
R R Square
Adjusted
R Square
Std. Error of
the Estimate
a.Predictors: (Constant), ISS - injury severity m
Coefficientsa
(Constant) ISS - injury severity meas
Mode 1
B Std. Error
Unstandardized Coefficients Beta
Standardized Coefficients t Sig.
a.Dependent Variable: Length of hospital stay
Question 2:
In-class questions
0.661 => Increasing ISS by 1 unit increases length of hospital
stay by 0.661 days.
Selection
bias
Information
bias
Confounding
bias
Biases
Bias is an error in an epidemiologic study
that results in an incorrect estimation of
the association between exposure and
outcome.
Multivariate Analysis
WHY?
To investigate the effect of more than one
independent variable.
Predict the outcome using various independent
variables.
Adjust for confounding variables
Multivariate analyses
Multiple Linear Regression
(If outcome is continuous)
Logistic Regression
(If outcome is 2 levels)