









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Professor: Mohommad; Class: ECON STAT; Subject: Economics; University: University of Maryland; Term: Unknown 1989;
Typology: Exams
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Example 1: Predicting presidential elections.
Is the state of the economy an important factor to predict presidential elections? How
responsive are voters to economic conditions?
Unemployment rate, inflation, GDP growth
i
0 1 2 3
' _ _ _ inf
Are incumbents more likely to be reelected with low inflation and low unemployment?
Example 2: Housing values and air pollution
i i i i i
0 1 2 3
where is the price of the ith house, are characteristics of the house (number of
bedrooms), are neighborhood characteristics (quality of the local schools, crime
rate), contains information on the level of air pollution,
i
i
i
i
Env
i
Example 3: Returns to education
How much will wages increase with another year of education?
i
0 1 2 3 4
log( ) exp
1
1
1
β
Test the following hypothesis...
1 1
0 1
e
e f
f i
i n
n i
i t
t i
i o
o n
n :
i
i v
v a
a r
r i
i a
a t
t e
e R
e
e g
g r
r e
e s
s s
s i
i o
o n
n
i
i v
v a
a r
r i
i a
a t
t e
e R
e
e g
g r
r e
e s
s s
s i
i o
o n
n a
a n
n a
a l
l y
y z
z e
e s
s t
t h
h e
e r
r e
e l
l a
a t
t i
i o
o n
n s
s h
h i
i p
p b
b e
e t
t w
w e
e e
e n
n t
t w
w o
o v
v a
a r
r i
i a
a b
b l
l e
e s
s .
I
t
t s
s p
p e
e c
c i
i f
f i
i e
e s
s o
o n
n e
e d
d e
e p
p e
e n
n d
d e
e n
n t
t (
a
a l
l s
s o
o c
c a
a l
l l
l e
e d
d r
r e
e s
s p
p o
o n
n s
s e
e ,
p
p r
r e
e d
d i
i c
c t
t e
e d
d ,
e
e x
x p
p l
l a
a i
i n
n e
e d
d ,
r
r e
e g
g r
r e
e s
s s
s a
a n
n d
d )
vvaarriiaabbllee aanndd oonnee iinnddeeppeennddeenntt ((aallssoo ccaalllleedd ccoonnttrrooll,, pprreeddiiccttoorr,, eexxppllaannaattoorryy,, rreeggrreessssoorr ))
vvaarriiaabbllee..
T
h
h i
i s
s h
h y
y p
p o
o t
t h
h e
e s
s i
i z
z e
e d
d r
r e
e l
l a
a t
t i
i o
o n
n s
s h
h i
i p
p m
m a
a y
y b
b e
e l
l i
i n
n e
e a
a r
r ,
q
q u
u a
a d
d r
r a
a t
t i
i c
c ,
o
o r
r w
w h
h a
a t
t e
e v
v e
e r
r .
Definition: The Linear Model
i i i
i
y represents the outcome of interest. It is a random variable.
i
x is the measured input(s) that impact or cause the value of.
i
y
i
model. It is a random variable. You can also think of
i
change for a fixed change in.
i
y
i
x
We can predict a level of given parameter values. The predicted
value will not always be accurate—sometimes we will over or under predict the true
value.
i
y
i
p
i
y = a + bx
Example 4: Returns to education
Suppose we estimate the following model:
log( wage )= α + β educ + ε
and β that minimize the sum of squared errors.
Objective: Minimize Sum of Squared Errors (SSE)
= =
n
i
n
i
i i i
Min e y a bx
1 1
2 2
This form treats positive and negative errors equally. The optimal values for a
and b are those that make the 1
st
derivative equal to zero. Functions reach min or max
when derivatives are zero. First, calculate the optimal value of a.
= =
n
i
n
i
i i i
Min e y a bx
0 0
2 2
i
i i
i
i i
i
i i
y a bx
n
y a bx
d SSE da y a bx
n bx b n x b x
n a nna a
n y y
n y n a n bx
i
i
i
i
i
i
i
i
i
i i
i
a y b x
y a bx
Then, get the optimal value of b.
x y y bx x
x y y bx bx
a y bx
d SSE db x y a bx
i i
i
i
i
i i i
i
i i i
i
i i
i
i i
i
i i
i
i i
i i
i i i i
b x y y x x x
b x x x x y y
x y y b x x x
but
i
i
i
i i
i
i i
i
i
i
i
i
i
i i
i
i
i
x x x x x x x x x x x x
x x x x x x
2
x does not vary across i , so
i
i
i
i
( x x ) x x ( x x )
Recall from above that ( − )= 0
i
i
x x , so
i
i i
i
i
( x x ) ( x x ) x
2
Using the same type of algebra, we can show that
i
i i
i
i i
i
i i
( x x )( y y ) ( x x ) y x ( y y )
2
i
i
i
i i
i
i i
i
i i
x x
n
x x y y
n
b x y y x x x
Look at numerator
i
i i
x x y y
n
By definition, sample covariance between x and y:
i
xy i i
x x y y
n
Look at denominator
i
i
x x
n
2
By definition, sample variance of x:
i
x i
x x
n
s
2 2
Therefore, / [ /( )][ / ]
2
xy x xy x y y x
Notice that = =
xy x y xy
[ σ / ss ] ρ sample correlation coefficient.
2
xy x xy x y y x xy y x
Thus, knowing and
x y
s , s
xy
ρ , we can estimate b.
In Summary
By minimizing the sum of squared errors, we pick the values of a and b
that “best fit” the data. The optimal values of a and b are:
a y b x
b s s s
xy x xy y x
2
i
i
i
p i
p
i
i
p
p
i
i i
p i i
p
p i
p
i i
y y y y e e
y y y y y y e e
2 2
2 2 2
The middle term equals 0,
i
i
p
i
i
p
i
i
i
p
i
i
p
i
i
p i
p
i
y y e y e y e y e y e
Therefore,
i
i
i
p
p
i
i
i
y y y y e
2 2 2
i
i
y y
2
( ) sum of squared total=SST
i
p
p
i
y y
2
( ) sum of squared model=SSM
i
i
e
2
sum of squared error=SSE
2
SST is how much variation there is in total in the endogenous variable of interest.
With our choice of a and b, we can predict a certain amount of this variation
(SSM). The fraction of SST that we can predict with our model is defined as
2
2
R lies between 0 and 1.
Perfect fit:
2
No fit:
2
Don’t forget!!
A high
2
only shows that X and Y vary together. Both could be affected by another
variable or by the way the data are defined.
Example 6:
2
R for Example 5
2
Source | SS df MS
-------------+------------------------------
Model | 27.5606288 1 27.
Residual | 120.769123 524.
-------------+------------------------------
Total | 148.329751 525.
Assumptions Concerning
i
i
Homoskedasticity
The variance is the same for all observations
2
ε
ε =σ
i
Var
3. cov[ , ]= 0
j k
ε ε , for No Autocorrelation
Errors are not correlated across observations
j ≠ k
4. cov[ , ]= 0
i i
1. Unbiasedness
Proof:
E ( a )= α
E ( b )= β
Model
i i i
i
i
i
i i
i
i
i
i i
x x
x x y
x x
n
x x y y
n
b
2
since
i
i i
i i
i i i
i
i i
( x x )( y y ) ( x x ) y y ( x x ) ( x x ) y ,
since ( − )= − = − = 0
x x x x nx nx
i i i
i i
i
i
i
i i
i
i
i
i i
i
i
i
i i
i
i
i
i i i
i
i
i
i i
x x
x x
x x
x x
x x
x x x
x x
x x x
x x
x x y
b
2 2 2 2 2
( ) ε
β
ε
β
α β ε
since
i
i i
i i
i i i
i
i i
( x x )( x x ) ( x x ) x x ( x x ) ( x x ) x
Notice that the sample estimate b contains the true value of β , plus the sample
ε
x
, then b is an unbiased estimate.
E [ b ]= β. If, however, ≠ 0
ε
x
, then the estimate of b does not provide accurate
information about the value of β.
If the realization of
i
of
i
x
β. We can in theory “sign” the bias.
and make 3 classes of 18 (for example). The key is that schools receive funds randomly
assigned. The researchers collect data on large classes and the 18-student small classes.
Basic model—regress test scores (y) on teachers/pupils(x)
i i i
We expect that β > 0 , and find b>0—more teachers produce higher test scores.
Is this an accurate reflection of the impact of “x on y”? Does the realization of
convey
any information about x? In this case, the answer is NO!! Suppose a class has higher
than average performance on exams ( > 0
i
expect to see? No—because teachers/pupils (x) was randomly assigned. Realization of
Example 8: Effects of fertilizer on crop yield
Which variables could affect crop yield?
Fertilizer amount
Rainfall
Quality of land
Presence of parasites
An experiment can determine the causal effect of fertilizer amount on crop yield.
Steps:
independently of other plot features that affect yield..
Explain idea of Causality/Ceteris paribus ...
Example 9: Returns to education ( experimental data )
assigned independently of other characteristics that affect productivity
(experience, innate ability)
Is this experiment feasible? What about moral issues?
Most of the time we have to work with nonexperimental data.
People choose their level of education (innate ability, social pressure, etc.)
Question: Have enough other factors been held fixed ( ceteris paribus ) to make a case for
causality?
Example 10 : Returns to education ( nonexperimental data )
In regression of log wages (y) on years of education (x), we expect β > 0 and estimates
of b confirm this result. Is this an accurate reflection of the impact of “education on
What does ε > 0 mean? It means someone with above average earnings given their
characteristics. Earnings could be greater than average for lots of reasons:
competitiveness, likeability, inherent intelligence, ability to get along with other people,
etc. Suppose that people with these same traits are the people who are more likely to
have above average levels of education. In this case, the fact that some one has
ε > 0 reveals that they are more likely to be a higher educated person, and therefore
ε
x
. Because
2
x x
b s
ε
= β +σ , it must be the case that b > β. The estimate for b
reflects not only the impact of more education on earnings, but the fact that people who
have unmeasured traits that are rewarded in the job market may also be the same people
who are more likely to get lots of education.
Example 11: In Example 5 we estimated the following model:
log( wage )= α + β educ + ε
but suppose that the true model is:
log( wage )= α+ β educ + γ abil + v
Let w =log( wage ), abil=ability
⇒ ε= γ abil + v
If Cov ( educ , abil )> 0 and γ > 0 (why?), b > β (upward bias)
So far we have used one explanatory variable to explain wages.
log( wage )= α + β educ + ε
Now we will use more than explanatory variables that affect simultaneously wages.
A multiple regression model allows to control for many other factors that affect the
dependent variable.
i i i k ki i
0 1 1 2 2
we have to estimate
0
1
k
The confidence interval is 0. 092 ± 1. 96 ( 0. 0073 ) or [´ 0. 0776 , 0. 1064 ]
Example 14: Based on the confidence interval [´ 0. 0776 , 0. 1064 ](Example 13), test the
following hypothesis:
1
0
educ
educ
Given that the confidence interval does not include = 0
educ
hypothesis.
Example 15: According to the result of t-statistic test the following hypothesis using a
5% level of significance.
1
0
educ
educ
educ
educ educ
seb
b
t
(value from the t-distribution)
Therefore we reject the null hypothesis.
Example 16: Now use the p-value to test the null hypothesis.
In this case, the p-value=0 and we reject the null hypothesis using 5% level of
significance.
Example 17 : Using the information from Example 12 test the following hypothesis
1 exp
0 exp
er
er
This is a right sided test!!!
t = ≈
Given that the number of degrees of freedom is 522, we can use the critical values from
the normal distribution.
critical
t (value from the normal distribution). The value from the t-distribution is
approximately 1.648.
Given , we reject the null hypothesis.
critical
t > t
Example 18 : Impact of Weather on House Prices
Researchers are interested in the impact of weather on house prices. With a sample of 42
cities, they setup the following model:
i i i
, where is home prices, and is the January temperature.
i
y
i
x
s
s
y
x
What is the 95% confidence interval for β? Test the hypothesis that : 0
0
95% confidence level. Using t-test, test the hypothesis at the 99% confidence level.
2
n
n
s
a y bx
b s s
e
y x
2
t n
se b s x x s s n
e x
i
e i
2
x x s n
x
i
i
The 95% confidence interval is
Because 0 is not in the confidence interval, we can reject : 0
0
Interpret the coefficient: For every 1-degree increase in average daily high temperatures,
prices go up by 1.152 (thousand). If you move from one city to another, and January