



















Studia grazie alle numerose risorse presenti su Docsity
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Prepara i tuoi esami
Studia grazie alle numerose risorse presenti su Docsity
Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity
Trova i documenti specifici per gli esami della tua università
Preparati con lezioni e prove svolte basate sui programmi universitari!
Rispondi a reali domande d’esame e scopri la tua preparazione
Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali
Studia con prove svolte, tesine e consigli utili
Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te
Esplora i documenti più scaricati per gli argomenti di studio più popolari
Ottieni i punti per scaricare
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Slides on the use of Proxy variables into a regression
Tipologia: Slide
1 / 27
Questa pagina non è visibile nell’anteprima
Non perderti parti importanti!




















Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2 , ..., Xk as shown above, and suppose that for some reason there are no data on X 2. Y X X X u k k
1 2 2 3 3
As we have seen, a regression of Y on X 3 , ..., Xk would yield biased estimates of the coefficients and invalid standard errors and tests. Y X X X u k k
1 2 2 3 3
The validity of the proxy relationship must be justified on the basis of theory, common sense, or experience. It cannot be checked directly because there are no data on X 2. Y X X X u k k
1 2 2 3 3
2
If a suitable proxy has been identified, the regression model can be rewritten as shown. Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
1. The estimates of the coefficients of X 3 , ..., Xk will be the same as those that would have been obtained if it had been possible to regress Y on X 2 , ..., Xk****.
2. The standard errors and t statistics of the coefficients of X 3 , ..., Xk will be the same as those that would have been obtained if it had been possible to regress Y on X 2 , ..., Xk****. Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
4. The coefficient of Z will be an estimate of 2 , and so it will not be possible to obtain an estimate of 2 , unless you are able to guess the value of . Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
5. However the t statistic for Z will be the same as that which would have been obtained for X 2 if it had been possible to regress Y on X 2 , ..., Xk , and so you are able to assess the significance of X 2 , even if you are not able to estimate its coefficient. Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
It is generally more realistic to hypothesize that the relationship between X 2 and Z is approximate, rather than exact. In that case the results listed above will hold approximately. Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
However, if Z is a poor proxy for X 2 , the results will effectively be subject to measurement error (see Chapter 8). Further, it is possible that some of the other X variables will try to act as proxies for X 2 , and there will still be a problem of omitted variable bias. Y X X X u k k
1 2 2 3 3
2 Z X X u Y Z X X u k k k k
1 2 2 3 3 1 2 3 3
As usual, ASVABC will be used as the measure of cognitive ability. However, there is no ‘family background’ variable in the data set. Indeed, it is difficult to conceive how such a variable might be defined. S ASVABC INDEX u 1 2 3
Instead, we will try to find a proxy. One obvious variable is the mother's educational attainment, SM****. However, father's educational attainment, SF , may also be relevant. So we will hypothesize that the family background index depends on both. S ASVABC INDEX u 1 2 3
1 2
. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104. Model | 1181.36981 3 393.789935 Prob > F = 0. Residual | 2023.61353 536 3.77539837 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 3204.98333 539 5.94616574 Root MSE = 1. ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528. SM | .0492424 .0390901 1.26 0.208 -.027546. SF | .1076825 .0309522 3.48 0.001 .04688. **_cons | 5.370631 .4882155 11.00 0.000 4.41158 6.
Here is the corresponding regression using** EAEF Data Set 21.
. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274. Model | 1081.97059 1 1081.97059 Prob > F = 0. Residual | 2123.01275 538 3.94612035 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 3204.98333 539 5.94616574 Root MSE = 1. ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165. **_cons | 6.066225 .4672261 12.98 0.000 5.148413 6.
Here is the regression of** S on ASVABC alone.