IV Regression in Stata
Contents
IV Regression in Stata#
This section on endogeneity quickly explores the problem of endogeneity and how to estimate this class of models in Stata. Recall that the OLS estimator requires
This code shows how to overcome estimation problems where this assumption fails but where we can identify an instrument for implementing instrumental variables regression (IV Regression). We demonstrate the uses of Stata for IV regression problems. First, let’s open up the data in Stata noting that we are using a “Cross-sectioned” version of Tobias and Koop that focuses on 1983. Load data and summarize:
webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
keep if time==4
sum
. webuse set "https://rlhick.people.wm.edu/econ407/data/"
(prefix now "https://rlhick.people.wm.edu/econ407/data")
. webuse tobias_koop
. keep if time==4
(16,885 observations deleted)
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
id | 1,034 1090.952 634.8917 4 2177
educ | 1,034 12.27466 1.566838 9 19
ln_wage | 1,034 2.138259 .4662805 .42 3.59
pexp | 1,034 4.81528 2.190298 0 12
time | 1,034 4 0 4 4
-------------+---------------------------------------------------------
ability | 1,034 .0165957 .9209635 -3.14 1.89
meduc | 1,034 11.40329 3.027277 0 20
feduc | 1,034 11.58511 3.735833 0 20
broken_home | 1,034 .1692456 .3751502 0 1
siblings | 1,034 3.200193 2.126575 0 15
-------------+---------------------------------------------------------
pexp2 | 1,034 27.97969 22.59879 0 144
.
An OLS Benchmark#
If we ignore any potential endogeneity problem we can estimate OLS as described in the OLS chapter companion. Here are the results from stata:
reg ln_wage pexp pexp2 educ broken_home
Source | SS df MS Number of obs = 1,034
-------------+---------------------------------- F(4, 1029) = 51.36
Model | 37.3778146 4 9.34445366 Prob > F = 0.0000
Residual | 187.21445 1,029 .181938241 R-squared = 0.1664
-------------+---------------------------------- Adj R-squared = 0.1632
Total | 224.592265 1,033 .217417488 Root MSE = .42654
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
pexp | .2035214 .0235859 8.63 0.000 .1572395 .2498033
pexp2 | -.0124126 .0022825 -5.44 0.000 -.0168916 -.0079336
educ | .0852725 .0092897 9.18 0.000 .0670437 .1035014
broken_home | -.0087254 .0357107 -0.24 0.807 -.0787995 .0613488
_cons | .4603326 .137294 3.35 0.001 .1909243 .7297408
------------------------------------------------------------------------------
where education, has the elasticity
margins, dyex(educ) continuous
Average marginal effects Number of obs = 1,034
Model VCE: OLS
Expression: Linear prediction, predict()
dy/ex wrt: educ
------------------------------------------------------------------------------
| Delta-method
| dy/ex std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | 1.046691 .1140274 9.18 0.000 .8229385 1.270444
------------------------------------------------------------------------------
IV Regression#
Suppose we are worried that education is endogenous. That is, it is
correlated with the population regression errors. This means OLS
estimates of \(\beta\) are biased. We hypothesize that the variable
feduc
is a good instrument having all the properties we describe in
detail in the notes document.
In stata, we use this code:
ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc)
Instrumental variables 2SLS regression Number of obs = 1,034
Wald chi2(4) = 138.19
Prob > chi2 = 0.0000
R-squared = 0.1277
Root MSE = .43528
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .1495027 .0320009 4.67 0.000 .0867821 .2122233
pexp | .214752 .0246553 8.71 0.000 .1664285 .2630755
pexp2 | -.0117453 .0023508 -5.00 0.000 -.0163529 -.0071377
broken_home | .0244713 .0397189 0.62 0.538 -.0533763 .102319
_cons | -.4064389 .4356072 -0.93 0.351 -1.260213 .4473354
------------------------------------------------------------------------------
Instrumented: educ
Instruments: pexp pexp2 broken_home feduc
Note that the estimate for the elasticity on education has nearly doubled compared to OLS
margins, dyex(educ) continuous
Average marginal effects Number of obs = 1,034
Model VCE: Unadjusted
Expression: Linear prediction, predict()
dy/ex wrt: educ
------------------------------------------------------------------------------
| Delta-method
| dy/ex std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | 1.835095 .3928002 4.67 0.000 1.065221 2.60497
------------------------------------------------------------------------------
The fact that the elasticities here is so much higher compared to the OLS elasticity is some evidence that we have an endogeneity problem so long as our maintained assumptions regarding the instruments, etc., hold. We can obtain robust standard errors using Stata’s ivregress command
ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc), robust
Instrumental variables 2SLS regression Number of obs = 1,034
Wald chi2(4) = 150.52
Prob > chi2 = 0.0000
R-squared = 0.1277
Root MSE = .43528
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .1495027 .0329085 4.54 0.000 .0850033 .2140021
pexp | .214752 .0238629 9.00 0.000 .1679815 .2615225
pexp2 | -.0117453 .0023595 -4.98 0.000 -.0163698 -.0071208
broken_home | .0244713 .0335032 0.73 0.465 -.0411937 .0901364
_cons | -.4064389 .4404503 -0.92 0.356 -1.269706 .4568278
------------------------------------------------------------------------------
Instrumented: educ
Instruments: pexp pexp2 broken_home feduc
Model Selection and Testing#
We have run IV regression in Stata but have more work to do for deciding whether it or the OLS model is appropriate for this case.
Test for relevant and strong instruments#
estat firststage
First-stage regression summary statistics
--------------------------------------------------------------------------
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(1,1029) Prob > F
-------------+------------------------------------------------------------
educ | 0.2416 0.2387 0.0878 80.2589 0.0000
--------------------------------------------------------------------------
Test for endogeneity#
estat endogenous
Tests of endogeneity
H0: Variables are exogenous
Robust score chi2(1) = 4.39334 (p = 0.0361)
Robust regression F(1,1028) = 4.35038 (p = 0.0372)
Test for overidentification (not relevant for this example)#
estat overid
And we get the error message, because our model is exactly identified (the number of instruments is equal to the number of endogenous variables):
SystemError: no overidentifying restrictions
r(498);
Note, since the number of instruments is equal to the number of
endogenous variables, we don’t have an overidentification problem,
and hence we get the no overidentifying restrictions
Stata error.
These results tell us we have relevant and strong instruments and that
education is likely endogenous.