Multiple Regression: Predicting Outcomes with Multiple Variables, Slides of Statistics

Multiple regression is a statistical method used to predict a score on a dependent variable (y) based on several predictor variables (x1, x2, ..., xp). This approach allows for a more accurate prediction as behavior is often influenced by multiple variables. The concept of multiple regression, the equation for the model, and how to calculate the estimates using least squares.

Typology: Slides

2012/2013

Uploaded on 08/31/2013

dhaval
dhaval 🇮🇳

4.6

(7)

65 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multiple Regression
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Multiple Regression: Predicting Outcomes with Multiple Variables and more Slides Statistics in PDF only on Docsity!

Multiple Regression

What is multiple regression?

  • What is multiple regression?
    • Predicting a score on Y based upon several predictors.
  • Why is this important?
    • Behavior is rarely a function of just one variable, but is instead influenced by many variables. So the idea is that we should be able to obtain a more accurate predicted score if using multiple variables to predict our outcome.

The Model

  • Multiple Linear Regression refers to regression applications in which there are several independent variables, x 1 , x 2 , … , xp. A multiple linear regression model with p independent variables has the equation - The ε is a random variable with mean 0 and variancey^ o^1 1 p^ p σ^2.

     x    x  

The prediction equation

  • A prediction equation for this model fitted to data is
    • Where denotes the “predicted” value computed from the equation, and b i denotes an estimate of β i.
    • These estimates are usually obtained by the method of least squares. - This means finding among the set of all possible values for the parameter estimates the ones which minimize the sum of squared residuals.

y^ ˆ  b o  b x 1 1  b xp p

  • The correlation between the criterion variable (Y) and the set of predictor variables (Xs) is Multiple R. - When we speak of multiple regression we use R instead of r. - So Multiple R squared is the amount of variation in Y that can be accounted for by the combination of variables.
  • Minitab, and other statistical software, will show a value of adjusted R squared. This is really a correction factor. Values of R squared tend to be larger in samples than they are in populations. The adjusted R squared is an attempt to correct for this. - It is also a factor that is important when there are multiple variables and few participants, since this scenario will give an inflated value of R squared just by chance.
  • The equation substitutes a common slope for the multiple variables and is not meaningfully interpreted but is used in the overall calculation.

ANOVA

  • An analysis of variance for a multiple linear regression model with p independent variables fitted to a data set with n observations is:

Source of

Variation DF SS MS

Model p SSR MSR

Error n-p-1 SSE MSE

Total n-1 SST

Sums of squares

  • The sums of squares SSR, SSE, and SST have

the same definitions in relation to the model as in simple linear regression:

2

2

2

SSR y^ ˆ y

SSE y y^ ˆ

SST y y

  • This increase in regression sum of squares is sometimes denoted

SSR(added variables | original variables),

  • Where original variables represents the list of independent variables that were in the model prior to adding new variables, and added variables represents the list of variables that were added to obtain the new model.
  • The overall SSR for the new model can be partitioned into the variation attributable to the original variables plus the variation due to the added variables that is not due to the original variables ,

SSR( all variables ) = SSR( original variables ) + SSR( added variables | original variables ).