IV Regression in Stata#

This section on endogeneity quickly explores the problem of endogeneity and how to estimate this class of models in Stata. Recall that the OLS estimator requires

\[ E(\mathbf{x'\epsilon}) = 0 \]

This code shows how to overcome estimation problems where this assumption fails but where we can identify an instrument for implementing instrumental variables regression (IV Regression). We demonstrate the uses of Stata for IV regression problems. First, let’s open up the data in Stata noting that we are using a “Cross-sectioned” version of Tobias and Koop that focuses on 1983. Load data and summarize:

webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
keep if time==4
sum
. webuse set "https://rlhick.people.wm.edu/econ407/data/"
(prefix now "https://rlhick.people.wm.edu/econ407/data")

. webuse tobias_koop

. keep if time==4
(16,885 observations deleted)

. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          id |      1,034    1090.952    634.8917          4       2177
        educ |      1,034    12.27466    1.566838          9         19
     ln_wage |      1,034    2.138259    .4662805        .42       3.59
        pexp |      1,034     4.81528    2.190298          0         12
        time |      1,034           4           0          4          4
-------------+---------------------------------------------------------
     ability |      1,034    .0165957    .9209635      -3.14       1.89
       meduc |      1,034    11.40329    3.027277          0         20
       feduc |      1,034    11.58511    3.735833          0         20
 broken_home |      1,034    .1692456    .3751502          0          1
    siblings |      1,034    3.200193    2.126575          0         15
-------------+---------------------------------------------------------
       pexp2 |      1,034    27.97969    22.59879          0        144

. 

An OLS Benchmark#

If we ignore any potential endogeneity problem we can estimate OLS as described in the OLS chapter companion. Here are the results from stata:

reg ln_wage pexp pexp2 educ broken_home
      Source |       SS           df       MS      Number of obs   =     1,034
-------------+----------------------------------   F(4, 1029)      =     51.36
       Model |  37.3778146         4  9.34445366   Prob > F        =    0.0000
    Residual |   187.21445     1,029  .181938241   R-squared       =    0.1664
-------------+----------------------------------   Adj R-squared   =    0.1632
       Total |  224.592265     1,033  .217417488   Root MSE        =    .42654

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        pexp |   .2035214   .0235859     8.63   0.000     .1572395    .2498033
       pexp2 |  -.0124126   .0022825    -5.44   0.000    -.0168916   -.0079336
        educ |   .0852725   .0092897     9.18   0.000     .0670437    .1035014
 broken_home |  -.0087254   .0357107    -0.24   0.807    -.0787995    .0613488
       _cons |   .4603326    .137294     3.35   0.001     .1909243    .7297408
------------------------------------------------------------------------------

where education, has the elasticity

margins, dyex(educ) continuous
Average marginal effects                                 Number of obs = 1,034
Model VCE: OLS

Expression: Linear prediction, predict()
dy/ex wrt:  educ

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/ex   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   1.046691   .1140274     9.18   0.000     .8229385    1.270444
------------------------------------------------------------------------------

IV Regression#

Suppose we are worried that education is endogenous. That is, it is correlated with the population regression errors. This means OLS estimates of \(\beta\) are biased. We hypothesize that the variable feduc is a good instrument having all the properties we describe in detail in the notes document.

In stata, we use this code:

ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc) 
Instrumental variables 2SLS regression            Number of obs   =      1,034
                                                  Wald chi2(4)    =     138.19
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1277
                                                  Root MSE        =     .43528

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .1495027   .0320009     4.67   0.000     .0867821    .2122233
        pexp |    .214752   .0246553     8.71   0.000     .1664285    .2630755
       pexp2 |  -.0117453   .0023508    -5.00   0.000    -.0163529   -.0071377
 broken_home |   .0244713   .0397189     0.62   0.538    -.0533763     .102319
       _cons |  -.4064389   .4356072    -0.93   0.351    -1.260213    .4473354
------------------------------------------------------------------------------
Instrumented: educ
 Instruments: pexp pexp2 broken_home feduc

Note that the estimate for the elasticity on education has nearly doubled compared to OLS

margins, dyex(educ) continuous
Average marginal effects                                 Number of obs = 1,034
Model VCE: Unadjusted

Expression: Linear prediction, predict()
dy/ex wrt:  educ

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/ex   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   1.835095   .3928002     4.67   0.000     1.065221     2.60497
------------------------------------------------------------------------------

The fact that the elasticities here is so much higher compared to the OLS elasticity is some evidence that we have an endogeneity problem so long as our maintained assumptions regarding the instruments, etc., hold. We can obtain robust standard errors using Stata’s ivregress command

ivregress 2sls ln_wage pexp pexp2 broken_home (educ=feduc), robust
Instrumental variables 2SLS regression            Number of obs   =      1,034
                                                  Wald chi2(4)    =     150.52
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1277
                                                  Root MSE        =     .43528

------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .1495027   .0329085     4.54   0.000     .0850033    .2140021
        pexp |    .214752   .0238629     9.00   0.000     .1679815    .2615225
       pexp2 |  -.0117453   .0023595    -4.98   0.000    -.0163698   -.0071208
 broken_home |   .0244713   .0335032     0.73   0.465    -.0411937    .0901364
       _cons |  -.4064389   .4404503    -0.92   0.356    -1.269706    .4568278
------------------------------------------------------------------------------
Instrumented: educ
 Instruments: pexp pexp2 broken_home feduc

Model Selection and Testing#

We have run IV regression in Stata but have more work to do for deciding whether it or the OLS model is appropriate for this case.

Test for relevant and strong instruments#

estat firststage
  First-stage regression summary statistics
  --------------------------------------------------------------------------
               |            Adjusted      Partial       Robust
      Variable |   R-sq.       R-sq.        R-sq.     F(1,1029)   Prob > F
  -------------+------------------------------------------------------------
          educ |  0.2416      0.2387       0.0878       80.2589    0.0000
  --------------------------------------------------------------------------

Test for endogeneity#

estat endogenous
  Tests of endogeneity
  H0: Variables are exogenous

  Robust score chi2(1)            =  4.39334  (p = 0.0361)
  Robust regression F(1,1028)     =  4.35038  (p = 0.0372)

Test for overidentification (not relevant for this example)#

estat overid

And we get the error message, because our model is exactly identified (the number of instruments is equal to the number of endogenous variables):

SystemError: no overidentifying restrictions
r(498);

Note, since the number of instruments is equal to the number of endogenous variables, we don’t have an overidentification problem, and hence we get the no overidentifying restrictions Stata error. These results tell us we have relevant and strong instruments and that education is likely endogenous.