The Multiple Classical Linear Regression Model: (CLRM), Study Guides, Projects, Research of Economics

CLRM stands for the Classical Linear Regression Model. The CLRM is also ... Three sets of assumptions define the multiple CLRM -- essentially the same.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/12/2022

ralphie
ralphie 🇬🇧

4.8

(8)

214 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECONOMICS 351* -- NOTE 11 M.G. Abbott
ECON 351* -- Note 11: The Multiple CLRM: Specification … Page 1 of 23 pages
ECON 351* -- NOTE 11
The Multiple Classical Linear Regression Model (CLRM):
Specification and Assumptions
1. Introduction
CLRM stands for the Classical Linear Regression Model. The CLRM is also
known as the standard linear regression model.
Three sets of assumptions define the multiple CLRM -- essentially the same
three sets of assumptions that defined the simple CLRM, with one
modification to assumption A8.
1. Assumptions respecting the formulation of the population regression
equation, or PRE.
Assumption A1
2. Assumptions respecting the statistical properties of the random error term
and the dependent variable.
Assumptions A2-A4
Assumption A2: The Assumption of Zero Conditional Mean Error
Assumption A3: The Assumption of Constant Error Variances
Assumption A4: The Assumption of Zero Error Covariances
3. Assumptions respecting the properties of the sample data.
Assumptions A5-A8
Assumption A5: The Assumption of Independent Random Sampling
Assumption A6: The Assumption of Sufficient Sample Data (N > K)
Assumption A7: The Assumption of Nonconstant Regressors
Assumption A8: The Assumption of No Perfect Multicollinearity
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download The Multiple Classical Linear Regression Model: (CLRM) and more Study Guides, Projects, Research Economics in PDF only on Docsity!

ECON 351* -- NOTE 11

The Multiple Classical Linear Regression Model (CLRM):

Specification and Assumptions

1. Introduction

CLRM stands for the Classical Linear Regression Model. The CLRM is also known as the standard linear regression model.

Three sets of assumptions define the multiple CLRM -- essentially the same three sets of assumptions that defined the simple CLRM, with one modification to assumption A8.

1. Assumptions respecting the formulation of the population regression equation , or PRE.

Assumption A

2. Assumptions respecting the statistical properties of the random error term and the dependent variable.

Assumptions A2-A

  • Assumption A2: The Assumption of Zero Conditional Mean Error
  • Assumption A3: The Assumption of Constant Error Variances
  • Assumption A4: The Assumption of Zero Error Covariances 3. Assumptions respecting the properties of the sample data.

Assumptions A5-A

  • Assumption A5: The Assumption of Independent Random Sampling
  • Assumption A6: The Assumption of Sufficient Sample Data (N > K)
  • Assumption A7: The Assumption of Nonconstant Regressors
  • Assumption A8: The Assumption of No Perfect Multicollinearity

2. Formulation of the Population Regression Equation (PRE)

Assumption A1: The population regression equation, or PRE, takes the form

Y X X X u = X u (A1)

k

j 1

= β 0 +β 1 1 +β 2 2 + +βk k+ β 0 +∑βj j+

L

or

i^ (A1)

k

j 1

Yi = β 0 +β 1 X 1 i+β 2 X 2 i+ +βkXki+ui =β 0 +∑βjXji+u

L

The second form of (A1) writes the PRE for a particular observation i.

As in the simple CLRM, the PRE (A1) incorporates three distinct assumptions.

A1.1: Assumption of an Additive Random Error Term.

The random error term u (^) i enters the PRE additively****.

u

Y

i

i (^) = ∂

for all i ( ∀ i).

A1.2: Assumption of Linearity-in-Parameters or Linearity-in-Coefficients.

The PRE is linear in the population regression coefficients β j (j = 0, ..., k).

Let x (^) i = [ 1 X 1 i X 2 i L Xki] be the (K×1) vector of regressor values for observation i.

f (x )

Y

j i j

i (^) = ∂β

where f (^) j (xi) contains no unknown parameters , j = 0, ..., k.

A1.3: Assumption of Parameter or Coefficient Constancy.

The population regression coefficients β j (j = 0, 1, ..., k) are (unknown) constants that do not vary across observations.

βji =β j = a constanti (j = 0, 1, ..., k).

  • Implication 2 of A2: the Orthogonality Condition. Assumption A2 also implies that the population values X (^) ji of the regressor X (^) j and u (^) i of the random error term u have zero covariance -- i.e., the population values of Xj and u are uncorrelated :

E ( u x) = 0 ⇒ Cov( X (^) j ,u) = E( Xju) = 0 , j = 1, 2, …, k (A2-2) or E ( ui xi) = 0 ⇒ Cov( X (^) ji ,ui) = E( Xjiui) = 0 ∀ i, j = 1, 2, …, k (A2-2)

1. The equality Cov( Xji ,ui) = E(X (^) jiui)in (A2-2) follows from the definition of the covariance between X (^) ji and u (^) i , and from assumption (A2):

( ) {[ ][ ]} { [ ] } [ ]

=E(X u ) sinceE(u) E(u|x ) 0 by A2.

E(X u ) E(X )E(u) sinceE(X )isaconstant

=EX u E(X )u

=E X E(X )u sinceE(u|x ) 0 byA

CovX ,u E X E(X ) u E(u|x ) bydefinition

ji i i i i

ji i ji i ji

ji i ji i

ji ji i i i

ji i ji ji i i i

2. Implication (A2-2) states that the random error term u has zero covariance with , or is uncorrelated with , each of the regressors X (^) j (j = 1, …, k) in the population. This assumption means that there exists no linear association between u and any of the k regressors X (^) j (j = 1, …, k).

Note that zero covariance between Xji and u (^) i implies zero correlation between Xji and u (^) i , since the simple correlation coefficient between Xji and u (^) i , denoted as ρ(Xji , u (^) i ), is defined as

ρ( , )

X u.

Cov X u Var X Var u

Cov X u ji i sd X sd u

ji i ji i

ji i ji i

From this definition of ρ(Xji , u (^) i ), it is obvious that if Cov(X (^) ji , u (^) i ) = 0, then ρ(Xji , u (^) i ) = 0, i.e.,

Cov X ( (^) ji , ui) = (^0) ⇒ ρ( X (^) ji , ui) = (^0).

  • Implication 3 of A2. Assumption A2 implies that the conditional mean of the population Yi values corresponding to given values X (^) ji of the regressors X (^) j (j = 1, …, k) equals the population regression function (PRF) :

E ( u x) = 0 ⇒ E ( Y x) = f(x)=β 0 +β 1 X 1 +β 2 X 2 +L+βkXk

= β + β

k j 1 0 jX^ j (A2-3)

or

E ( ui xi) = 0 ⇒ E ( Yi xi) = f(xi)=β 0 +β 1 X 1 i+β 2 X 2 i+L+βkXki

∑^ ∀^ i.^ (A2-3)

= β + β

k

j 1

0 jXji

Proof: Take the conditional expectation of the PRE (A1) for some given set of regressor values x (^) i = [ 1 X 1 i X 2 i L Xki]:

i

k

j 1

Yi = β 0 +β 1 X 1 i+β 2 X 2 i+ +βkXki+ui =β 0 +∑βjXji+u

L (A1)

( ) ( ) ( ) ( ) ( )

X since E X x X.

X X X

E X X X x byA 2 ,Eu x 0

EY x E X X X x Eu x

k

j 1

0 j ji

k

j 1

i

k

j 1

0 j ji 0 j ji

0 1 1 i 2 2 i k ki

0 1 1 i 2 2 i k ki i i i

i i 0 1 1 i 2 2 i k ki i i i

∑ ∑ ∑ = = =

⎟ =β + β ⎠

=β + β β + β

=β +β +β + +β

= β +β +β + +β =

= β +β +β + +β +

L

L

L

  • Meaning of the Zero Conditional Mean Error Assumption A2:

Each set of regressor values x (^) i = [ 1 X 1 i X 2 i L Xki]identifies a segment or subset of the relevant population, specifically the segment that has those particular values of the regressors. For each of these population segments or subsets, assumption A2 says that the mean of the random error u is zero.

  • Common causes of correlation or dependence between the X (^) j and u -- i.e., common causes of violations of assumption A2. 1. Incorrect specification of the functional form of the relationship between Y and the X (^) j , j = 1, …, k. Examples: Using Y as the dependent variable when the true model has ln(Y) as the dependent variable. Or using X (^) j as the independent variable when the true model has ln(Xj ) as the independent variable. 2. Omission of relevant variables that are correlated with one or more of the included regressors X (^) j , j = 1, …, k. 3. Measurement errors in the regressors X (^) j , j = 1, …, k. 4. Joint determination of one or more X (^) j and Y.

Assumption A3: The Assumption of Constant Error Variances The Assumption of Homoskedastic Errors The Assumption of Homoskedasticity

The conditional variances of the random error terms u (^) i are identical for all observations -- i.e., for all sets of regressor values x = [ 1 X 1 X 2 L Xk] ) -- and equal the same finite positive constant σ^2 for all i:

Var ( ux) = E( u^2 x) =σ^2 > 0 (A3) or Var^ ( ui xi)^ = E(^ ui^2 xi)^ =σ^2 > 0 ∀i (A3)

where σ^2 is a finite positive (unknown) constant and x (^) i =[ 1 X 1 i X 2 i L Xki] is the (K×1) vector of regressor values for observation i.

  • The first equality in A3 follows from the definition of the conditional variance of u (^) i and assumption A2:

( ) {[ ] }

{ [ ] } E ( u x ).

=E u 0 x becauseE(u|x) 0 byassumptionA 2

Var u x E u E(u|x) x bydefinition

i

2 i

i i i

2 i

i

2 i i i i i

  • Meaning of the Homoskedasticity Assumption A
  • For each set of regressor values, there is a conditional distribution of random errors , and a corresponding conditional distribution of population Y values.
  • Assumption A3 says that the variance of the random errors for any particular set of regressor values x (^) i = [^1 X 1 i X 2 i L Xki]is the same as the variance of the random errors for any other set of regressor values x (^) s = [ 1 X 1 s X 2 s L X (^) ks](for allx (^) s ≠ xi).

In other words, the variances of the conditional random error distributions corresponding to each set of regressor values in the relevant population are all equal to the same finite positive constant σ^2.

Var ( ui xi) = Var( us xs) =σ^2 > 0 for all x (^) s ≠ xi.

  • Implication A3-2 says that the variance of the population Y values for [ L Xki]is the same as the variance of the population Y values for any other set of regressor values

x =xi = 1 X 1 i X 2 i x = xs = [ 1 X 1 s X 2 s L Xks](for all x (^) s ≠ xi). The conditional distributions of the population Y values around the PRF have the same constant variance σ^2 for all sets of regressor values.

Var ( Yi xi) = Var( Ys xs) =σ^2 > 0 for all x (^) s ≠ xi.

Assumption A4: The Assumption of Zero Error Covariances The Assumption of Nonautoregressive Errors The Assumption of Nonautocorrelated Errors

Consider any pair of distinct random error terms u (^) i and u (^) s (is) corresponding to two different sets (or vectors) of regressor values xixs. This assumption states that u (^) i and u (^) s have zero covariance :

Cov ( u (^) i ,us xi,xs) = E( uius xi,xs) = 0 ∀i≠s. (A4)

  • The first equality in (A4) follows from the definition of the conditional covariance of u (^) i and u (^) s and assumption (A2):

Cov ( u (^) i ,us xi,xs) ≡ E{ [ ui−E(ui|xi)][ us−E(us|xs)] xi,xs} by definition

= E( u (^) i us xi,xs) since E( ui xi)= E(us xs)= 0 by A2.

  • The second equality in (A4) states the assumption that all pairs of error terms corresponding to different sets of regressor values have zero covariance.
  • Implication of A4: Assumption A4 implies that the conditional covariance of any two distinct values of the regressand, say Y (^) i and Ys where i ≠ s, is equal to zero:

Cov u ( (^) i , u (^) s x (^) i , x (^) s) = 0 ∀ i ≠s ⇒ Cov Y Y( (^) i , (^) s x (^) i , x (^) s) = 0 ∀ i ≠s.

4. Properties of the Sample Data

Assumption A5: Random Sampling or Independent Random Sampling

The sample data consist of N randomly selected observations on the regressand Y and the regressors X (^) j (j = 1, ..., k), the observable variables in the PRE described by A1. These N randomly selected observations can be written as N row vectors:

[ ] ( ) (Y,x) i 1 , ,N.

Y, 1 ,X ,X , ,X i 1 , ,N

Sampledata (Y,x ),(Y,x ), ,(Y ,x )

i i

i 1 i 2 i ki

1 1 2 2 N N

K

K K

K

  • Implications of the Random Sampling Assumption A

The assumption of random sampling implies that the sample observations are statistically independent.

1. It thus means that the error terms u (^) i and u (^) s are statistically independent , and hence have zero covariance , for any two observations i and s.

Random sampling ⇒ Cov( ui ,us xi,xs) = Cov( ui ,us)= 0 ∀ i ≠ s.

2. It also means that the dependent variable values Yi and Y (^) s are statistically independent , and hence have zero covariance , for any two observations i and s.

Random sampling ⇒ Cov( Y (^) i ,Ys xi,xs) = Cov( Yi ,Ys)= 0 ∀ i ≠ s.

The assumption of random sampling is therefore sufficient for assumption A of zero covariance between observations, but is stronger than necessary.

  • When is the Random Sampling Assumption A5 Appropriate?

The random sampling assumption is often appropriate for cross-sectional regression models , but is hardly ever appropriate for time-series regression models.

Assumption A6: The number of sample observations N is greater than the number of unknown parameters K:

number of sample observations > number of unknown parameters

N > K. (A6)

  • Meaning of A6: Unless this assumption is satisfied, it is not possible to compute from a given sample of N observations estimates of all the unknown parameters in the model.

Assumption A7: Nonconstant Regressors

The sample values X (^) ji of each regressor X (^) j (j = 1, …, k) in a given sample (and hence in the population) are not all equal to a constant:

Xjicji = 1, ..., N where the cj are constants (j = 1, ..., k). (A7)

  • Technical Form of A7: Assumption A7 requires that the sample variances of

all k1 non-constant regressors Xj (j = 1, ..., k) must be finite positive numbers for any sample size N; i.e.,

sample variance of Xji ≡ Var(Xji ) =

i X^ ji Xj N

= s (^) X^2 j > 0,

where s (^) X^2 j > 0 are finite positive numbers for all j = 1, ..., k.

  • Meaning of A7: Assumption A7 requires that each nonconstant regressor X (^) j (j = 1, …, k) takes at least two different values in any given sample.

Unless this assumption is satisfied, it is not possible to compute from the sample data an estimate of the effect on the regressand Y of changes in the value of the regressor X (^) j. In other words, to calculate the effect of changes in Xj on Y, the sample values X (^) ji of the regressor X (^) j must vary across observations in any given sample.

  • Example of Perfect Multicollinearity

Consider the following multiple linear regression model:

Yi =β 0 +β 1 X 1 i+β 2 X 2 i+ui (i=1,...,N). (2)

Suppose that the sample values of the regressors X (^) 1i and X2i satisfy the following linear equality for all sample observations:

X 1 i = 3 X 2 i or X (^1) i − 3 X 2 i= 0 ∀ i = 1,...,N. (3)

The exact linear relationship (3) can be written in the general form (1).

  1. For the linear regression model given by PRE (2), equation (1) takes the form

λ 0 + λ 1 X (^1) i+λ 2 X 2 i= 0 ∀ i =1 2, , K, N.

  1. Set λ 0 = 0 , λ 1 = 1 , and λ 2 =− 3 in the above equation:

X (^1) i − 3 X 2 i= 0 ∀ i =1 2, , K , N. (identical to equation (3) above.)

  • Consequences of Perfect Multicollinearity 1. Substitute for X1i in PRE (2) the equivalent expression X 1 i = 3 X 2 i :

Yi =β 0 +β 1 X 1 i+β 2 X 2 i+u i

=β 0 +β 1 ( 3 X (^2) i) +β 2 X 2 i+ui

=β 0 + 3 β 1 X 2 i+β 2 X 2 i+u i

=β 0 + ( 3 β 1 +β 2 )X (^2) i+ui

= β 0 +α 2 X (^2) i+u i where α 2 = 3 β 1 +β 2 (4a)

♦ It is possible to estimate from the sample data the regression coefficients β 0 and α2.

♦ But from the estimate of α 2 it is not possible to compute estimates of the coefficients β 1 and β2. Reason: The equation

α 2 = 3 β 1 +β 2

is one equation containing two unknowns, namely β 1 and β2.

Result: It is not possible to compute from the sample data estimates of both β 1 and β 2 , the separate linear effects of X (^) 1i and X2i on the regressand Yi.

5. Interpreting Slope Coefficients in Multiple Linear

Regression Models

  • Consider the multiple linear regression model given by the following population regression equation (PRE) :

Yi = β 0 +β 1 X 1 i +β 2 X 2 i +β 3 X 3 i + u i (5)

X1, X 2 and X 3 are three distinct independent or explanatory variables that determine the population values of Y.

Because regression equation (5) contains more than one regressor, it is called a multiple linear regression model.

  • The population regression function (PRF) corresponding to PRE (5) is:

E ( Yi xi) = E( Yi X 1 i,X 2 i,X 3 i) =β 0 +β 1 X 1 i +β 2 X 2 i +β 3 X 3 i (6)

where x (^) iis the 1×4 row vector of regressors: x (^) i = ( 1 X 1 iX 2 iX 3 i).

‰ Interpreting the Slope Coefficients in Multiple Regression Model (5)

  • Each slope coefficient β j is the marginal effect of the corresponding explanatory variable Xj on the conditional mean of Y. Formally, the slope coefficients { β j : j = 1, 2, 3} are the partial derivatives of the population regression function (PRF) with respect to the explanatory variables {X (^) j : j = 1, 2, 3} :

( ) ( ) j ji

i 1 i 2 i 3 i ji

i i X

E Y X ,X ,X

X

E Y x =β ∂

j = 1, 2, 3 (7)

For example, for j = 1 in multiple regression model (5):

( ) 1 1 i

0 1 1 i 2 2 i 3 3 i 1 i

i 1 i 2 i 3 i X

( X X X )

X

E Y X ,X ,X

=β ∂

∂ β +β +β +β

  • Interpretation: A partial derivative isolates the marginal effect on the conditional mean of Y of small variations in one of the explanatory variables, while holding constant the values of the other explanatory variables in the PRF.

Example: In multiple regression model (5)

Yi = β 0 +β 1 X 1 i +β 2 X 2 i +β 3 X 3 i + u i (5)

with population regression function

E ( Yi X 1 i,X 2 i,X 3 i) = β 0 +β 1 X 1 i +β 2 X 2 i +β 3 X 3 i (6)

the slope coefficients β 1, β 2 and β 3 are interpreted as follows:

β 1 = the partial marginal effect of X 1 on the conditional mean of Y holding constant the values of the other regressors X 2 and X (^) 3.

β 2 = the partial marginal effect of X 2 on the conditional mean of Y holding constant the values of the other regressors X 1 and X (^) 3.

β 3 = the partial marginal effect of X 3 on the conditional mean of Y holding constant the values of the other regressors X 1 and X (^) 2.

  • Including X 2 and X 3 in the regression function allows us to estimate the partial marginal effect of X 1 on E( Y X 1 ,X 2 ,X 3 )while
    • holding constant the values of X 2 and X 3
    • controlling for the effects on Y of X 2 and X 3
    • conditioning on X 2 and X 3.