Panel Data Estimation in Stata#

This document, a companion to the Panel Data series of lecture notes, provides a brief description of how to implement panel data models in Stata. We will load the Tobias and Koop but this time will use the entire dataset since we are now ready to exploit the panel nature of the full dataset. Load data and summarize:

# start a connected stata17 session
from pystata import config
config.init('be')
config.set_streaming_output_mode('off')
%%stata
webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
sum
. webuse set "https://rlhick.people.wm.edu/econ407/data/"
(prefix now "https://rlhick.people.wm.edu/econ407/data")

. webuse tobias_koop

. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          id |     17,919    1081.852    629.8459          1       2178
        educ |     17,919    12.67604    1.922433          9         20
     ln_wage |     17,919    2.296821    .5282364        .07       4.57
        pexp |     17,919    8.362688    4.127502          0         22
        time |     17,919    8.196719    3.956042          0         14
-------------+---------------------------------------------------------
     ability |     17,919     .052374    .9261294      -4.04       2.01
       meduc |     17,919     11.4719    2.988851          0         20
       feduc |     17,919    11.70925    3.766923          0         20
 broken_home |     17,919     .153859    .3608236          0          1
    siblings |     17,919    3.156203    2.120989          0         18
-------------+---------------------------------------------------------
       pexp2 |     17,919    86.96986    75.26336          0        484

. 

We will be estimating a very simple returns to education equation and will ignore any endogeneity issues for the purposes of illustration.

Pooled OLS#

Here are the commands to run pooled OLS in stata. After we run the model, we store the model estimates for doing hypothesis testing later.

%%stata
regress ln_wage educ pexp pexp2 broken_home 
est store bpool
. regress ln_wage educ pexp pexp2 broken_home 

      Source |       SS           df       MS      Number of obs   =    17,919
-------------+----------------------------------   F(4, 17914)     =    928.58
       Model |  858.620393         4  214.655098   Prob > F        =    0.0000
    Residual |  4141.10561    17,914  .231165882   R-squared       =    0.1717
-------------+----------------------------------   Adj R-squared   =    0.1715
       Total |  4999.72601    17,918  .279033709   Root MSE        =     .4808

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0890684    .001939    45.93   0.000     .0852677    .0928691
        pexp |   .0907662   .0034375    26.40   0.000     .0840284     .097504
       pexp2 |   -.003046   .0001895   -16.07   0.000    -.0034175   -.0026745
 broken_home |  -.0561547   .0100131    -5.61   0.000    -.0757813    -.036528
       _cons |   .6822885   .0286842    23.79   0.000     .6260648    .7385122
------------------------------------------------------------------------------

. est store bpool

. 

Random Effects#

In stata, we run the following code. Notice, we use xtset to inform stata of the panel data individual (id) and time (time) identifiers. Also, save the results for analysis later:

%%stata
xtset id time
xtreg ln_wage educ pexp pexp2 broken_home , re
est store bre
. xtset id time

Panel variable: id (unbalanced)
 Time variable: time, 0 to 14, but with gaps
         Delta: 1 unit

. xtreg ln_wage educ pexp pexp2 broken_home , re

Random-effects GLS regression                   Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-squared:                                      Obs per group:
     Within  = 0.2283                                         min =          1
     Between = 0.1489                                         avg =        8.2
     Overall = 0.1714                                         max =         15

                                                Wald chi2(4)      =    5026.37
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0901376   .0034376    26.22   0.000     .0834001    .0968751
        pexp |   .1027822   .0026045    39.46   0.000     .0976776    .1078868
       pexp2 |  -.0036384   .0001426   -25.52   0.000    -.0039179   -.0033589
 broken_home |  -.0646814   .0228582    -2.83   0.005    -.1094827   -.0198801
       _cons |   .5807788   .0441847    13.14   0.000     .4941784    .6673792
-------------+----------------------------------------------------------------
     sigma_u |  .37244396
     sigma_e |  .32867768
         rho |  .56218095   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. est store bre

. 

Or, the Swamy-Aurora version of the random effects model (closest to what R uses):

%%stata
xtreg ln_wage educ pexp pexp2 broken_home , sa th
est store bre_sa
. xtreg ln_wage educ pexp pexp2 broken_home , sa th

Random-effects GLS regression                   Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-squared:                                      Obs per group:
     Within  = 0.2283                                         min =          1
     Between = 0.1492                                         avg =        8.2
     Overall = 0.1714                                         max =         15

                                                Wald chi2(4)      =    5004.33
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

------------------- theta --------------------
  min      5%       median        95%      max
0.3201   0.4517     0.6885     0.7595   0.7672

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0904542   .0033583    26.93   0.000     .0838721    .0970363
        pexp |   .1025766   .0026118    39.27   0.000     .0974575    .1076957
       pexp2 |  -.0036296   .0001431   -25.37   0.000      -.00391   -.0033492
 broken_home |  -.0642368    .022005    -2.92   0.004    -.1073659   -.0211077
       _cons |   .5782614   .0432297    13.38   0.000     .4935328      .66299
-------------+----------------------------------------------------------------
     sigma_u |  .35445648
     sigma_e |  .32867768
         rho |  .53768242   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. est store bre_sa

. 

Fixed Effects#

In stata, the easiest model to run is the “between” estimator. Note, stata automatically drops covariates that do not vary within an individual’s observations so that the model runs, but it leaves the variable in the regression results output:

%%stata
xtreg ln_wage educ pexp pexp2 broken_home , fe
est store bfe
. xtreg ln_wage educ pexp pexp2 broken_home , fe
note: broken_home omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-squared:                                      Obs per group:
     Within  = 0.2286                                         min =          1
     Between = 0.1331                                         avg =        8.2
     Overall = 0.1663                                         max =         15

                                                F(3,15738)        =    1554.76
corr(u_i, Xb) = 0.0053                          Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0761987   .0059414    12.83   0.000     .0645529    .0878444
        pexp |   .1070331   .0027728    38.60   0.000      .101598    .1124681
       pexp2 |  -.0038188    .000149   -25.63   0.000    -.0041109   -.0035268
 broken_home |          0  (omitted)
       _cons |   .7679618   .0716835    10.71   0.000     .6274539    .9084696
-------------+----------------------------------------------------------------
     sigma_u |  .40417965
     sigma_e |  .32867768
         rho |   .6019421   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2177, 15738) = 10.41                Prob > F = 0.0000

. est store bfe

. 

Hypothesis Testing and Model Comparison#

We have various tests to run:

OLS (Pooled) versus Random Effects#

In stata, right after running the random effects model, run

%%stata 
quietly: xtreg ln_wage educ pexp pexp2 broken_home , re
xttest0
. quietly: xtreg ln_wage educ pexp pexp2 broken_home , re

. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

        ln_wage[id,t] = Xb + u[id] + e[id,t]

        Estimated results:
                         |       Var     SD = sqrt(Var)
                ---------+-----------------------------
                 ln_wage |   .2790337       .5282364
                       e |    .108029       .3286777
                       u |   .1387145        .372444

        Test: Var(u) = 0
                             chibar2(01) = 19338.65
                          Prob > chibar2 =   0.0000

. 

OLS (Pooled) versus Fixed Effects#

In stata, this is tested by default and included in the fe output (at the end):

------------------------------------------------------------------------------
F test that all u_i=0:     F(2177, 15738) =    10.41         Prob > F = 0.0000

Random Effects versus Fixed Effects#

In stata, install xtoverid and ivreg21 and use this after the fixed effects regression:

%%stata
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re   
Sargan-Hansen statistic  31.892  Chi-sq(3)    P-value = 0.0000

or, you can use the Hausman test explictly. Let’s run Hausman on the Swamy-Arora version of random effects:

%%stata
hausman bfe bre_sa
                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |      bfe         bre_sa       Difference       Std. err.
-------------+----------------------------------------------------------------
        educ |    .0761987     .0904542       -.0142555        .0049012
        pexp |    .1070331     .1025766        .0044565         .000931
       pexp2 |   -.0038188    -.0036296       -.0001892        .0000415
------------------------------------------------------------------------------
                          b = Consistent under H0 and Ha; obtained from xtreg.
           B = Inconsistent under Ha, efficient under H0; obtained from xtreg.

Test of H0: Difference in coefficients not systematic

    chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            =  43.88
Prob > chi2 = 0.0000

Endogeneity and Panel Data#

IV regression on panel data (where x4 is endogenous and instrumented by z1) can be performed by

%%stata
xtivreg y x1 x2 x3 (x4=z1), fe

Robust Standard Errors and Panel Data#

Use this stata code:

%%stata
xtreg ln_wage educ pexp pexp2, fe robust
Fixed-effects (within) regression               Number of obs     =     17,919
Group variable: id                              Number of groups  =      2,178

R-squared:                                      Obs per group:
     Within  = 0.2286                                         min =          1
     Between = 0.1331                                         avg =        8.2
     Overall = 0.1663                                         max =         15

                                                F(3,2177)         =     535.99
corr(u_i, Xb) = 0.0053                          Prob > F          =     0.0000

                                 (Std. err. adjusted for 2,178 clusters in id)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        educ |   .0761987   .0099113     7.69   0.000     .0567621    .0956352
        pexp |   .1070331   .0042128    25.41   0.000     .0987716    .1152945
       pexp2 |  -.0038188   .0002193   -17.42   0.000    -.0042489   -.0033888
       _cons |   .7679618   .1208608     6.35   0.000     .5309472    1.004976
-------------+----------------------------------------------------------------
     sigma_u |  .40417965
     sigma_e |  .32867768
         rho |   .6019421   (fraction of variance due to u_i)
------------------------------------------------------------------------------

1

ssc install xtoverid and ssc install ivreg2.