Sample Data - Applied Regression Analysis - Lecture Slides, Slides of Data Analysis & Statistical Methods

It is the Lecture Slides of Applied Regression Analysis which includes Perfect Multicollinearity, Overall Significance of Equation, Recap, Population of Students etc. Key important points are: Sample Data, Jackie, Philip, Bryan, Rita, Shane, Keith, Kelsie, Formulas, Email Attachment

Typology: Slides

2012/2013

Uploaded on 02/07/2013

bala
bala 🇮🇳

4.1

(12)

110 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Here is our sample data on height
and weight.
Observation Height (H or X) Weight (W or Y)
1.Jackie 64 130
2. Philip D. 75 210
3. Bryan 76 230
4. Rita 67 190
5. Shane 68 175
6. Keith 75 190
7. Kelsie 65 145
8. Di 72 185
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Sample Data - Applied Regression Analysis - Lecture Slides and more Slides Data Analysis & Statistical Methods in PDF only on Docsity!

Here is our sample data on height

and weight.

Observation Height (H or X) Weight (W or Y) 1.Jackie 64 130

  1. Philip D. 75 210
  2. Bryan 76 230
  3. Rita 67 190
  4. Shane 68 175
  5. Keith 75 190
  6. Kelsie 65 145
  7. Di 72 185

Assignment 1(Carries 30 points

and is due before noon on

Thursday, September 6 )

  1. Use the data set on the previous slide and the formulas on Page 8 (1-5 and 1-6) to estimated the coefficients β 0 ^ and β 1 ^ in the equation below W = β 0 ^ + β 1 ^ H
  • Make sure to show your work.
  • Do the estimated coefficients make sense to you?
  • What is the meaning of the estimated coefficients?

Note:

  • The following notes are not going to take

the place of the discussions covered in

your text books

  • First read the book
  • Then look at the notes

Total, Explained and Residual Sum

of Squares (PP11-13)

  • Remember our height/weight example
  • What is the average weight of the

class?

  • Duplicate the graph on Page 12 where

Y is the weight and X is the height

  • The Fitted Line will be upward sloping
  • The Average Line (average weight) will be horizontal

Remember we have 8 observations

in our sample

  • Some of our weights are below average

and some are above average.

  • Look at Equation 1-8, Page 12
    • The reason why we square (Y – Ybar), (Y^ - Ybar) and (Y – Y^) is because we do not want the positive differences to cancel the negative differences
  • Note: the best fitted line will be the one

with the lowest (Y – Y^) 2

Multiple Regression Model

(Chapter 2, PP20-29)

  • Is height the only factor affecting weight?
    • Of course not.
    • What are some other factors affecting an individual’s weight? - Age - Calorie in take per day - ……

The meaning of the estimated

coefficients

  • Our estimated equations will be
  • Y^ = β 0 ^ + β 1 ^ X 1 + β 2 ^ X 2 + β 3 ^ X 3
    • Bonus: Can someone tell me why didn’t I put an “e” at the end of the above equation?
  • β 1 ^ measures the effect of one more inch of height on weight, holding the age and the calorie intake constant and ignoring the effect of all other variables on weight.
  • Similarly β 2 ^ measures the effect of one more year of age on weight , holding the weight and the calorie intake constant and ignoring the effect of all other variables on weight.

How big should the sample be?

  • The bigger the sample the closer the β^

will be to β.

  • Rule of thumb: Degrees of Freedom >
  • Degrees of Freedom = n- k-
    • Where n is the sample size and k is the number of independent variables.

Assumption 1

The regression equation

  • Is linear in coefficients (not linear in

variables)

  • Is correctly specified (right functional

form, no omitted variables, no irrelevant

variables)

  • Has additive error term

Assumption 2

Two or more independent variables are not perfectly correlated with each other.

  • If violated  Perfect Multicollinearity
  • Example
  • Consumption = f (inflation, real interest rate, nominal interest rate, ….) - Since real interest = nominal interest – inflations, - The 3 independent variables are perfectly and linearly correlated with each other. When one independent variable changes, the others change too. OLS can not capture the effect of one variable in isolation

Assumption 4

  • The error terms are uncorrelated with

each other

  • What if it is violated?
  • Then we have autocorrelation (serial

correlation) problem

  • Example: Consumption = f (…., income)
    • Suppose we use time series data on the US economy to estimate the above model. - Suppose that in 5 years of our study there was a war and consumption dropped significantly even though income didn’t. So, we will get negative errors during those years and they all seem to be correlated with each other. Docsity.com

Assumption 5

  • The error term must have a zero mean
  • What if this assumption is violated
  • This is not a big deal: the intercept will pick

up the mean of the error term

Assumption 7 (Not Necessary)

  • The error term is normally distributed
  • What is a normal distribution?
  • Symmetric, continuous, bell shaped
  • Can be characterized by its mean and variance
  • Must know if it is violated
  • If violated, some statistical tests are not applicable
  • As the size of sample goes up  the distribution becomes more normal