Panel Data Estimation in Stata
Contents
Panel Data Estimation in Stata#
This document, a companion to the Panel Data series of lecture notes, provides a brief description of how to implement panel data models in Stata. We will load the Tobias and Koop but this time will use the entire dataset since we are now ready to exploit the panel nature of the full dataset. Load data and summarize:
# start a connected stata17 session
from pystata import config
config.init('be')
config.set_streaming_output_mode('off')
%%stata
webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
sum
. webuse set "https://rlhick.people.wm.edu/econ407/data/"
(prefix now "https://rlhick.people.wm.edu/econ407/data")
. webuse tobias_koop
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
id | 17,919 1081.852 629.8459 1 2178
educ | 17,919 12.67604 1.922433 9 20
ln_wage | 17,919 2.296821 .5282364 .07 4.57
pexp | 17,919 8.362688 4.127502 0 22
time | 17,919 8.196719 3.956042 0 14
-------------+---------------------------------------------------------
ability | 17,919 .052374 .9261294 -4.04 2.01
meduc | 17,919 11.4719 2.988851 0 20
feduc | 17,919 11.70925 3.766923 0 20
broken_home | 17,919 .153859 .3608236 0 1
siblings | 17,919 3.156203 2.120989 0 18
-------------+---------------------------------------------------------
pexp2 | 17,919 86.96986 75.26336 0 484
.
We will be estimating a very simple returns to education equation and will ignore any endogeneity issues for the purposes of illustration.
Pooled OLS#
Here are the commands to run pooled OLS in stata. After we run the model, we store the model estimates for doing hypothesis testing later.
%%stata
regress ln_wage educ pexp pexp2 broken_home
est store bpool
. regress ln_wage educ pexp pexp2 broken_home
Source | SS df MS Number of obs = 17,919
-------------+---------------------------------- F(4, 17914) = 928.58
Model | 858.620393 4 214.655098 Prob > F = 0.0000
Residual | 4141.10561 17,914 .231165882 R-squared = 0.1717
-------------+---------------------------------- Adj R-squared = 0.1715
Total | 4999.72601 17,918 .279033709 Root MSE = .4808
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0890684 .001939 45.93 0.000 .0852677 .0928691
pexp | .0907662 .0034375 26.40 0.000 .0840284 .097504
pexp2 | -.003046 .0001895 -16.07 0.000 -.0034175 -.0026745
broken_home | -.0561547 .0100131 -5.61 0.000 -.0757813 -.036528
_cons | .6822885 .0286842 23.79 0.000 .6260648 .7385122
------------------------------------------------------------------------------
. est store bpool
.
Random Effects#
In stata, we run the following code. Notice, we use xtset to inform stata of the panel data individual (id) and time (time) identifiers. Also, save the results for analysis later:
%%stata
xtset id time
xtreg ln_wage educ pexp pexp2 broken_home , re
est store bre
. xtset id time
Panel variable: id (unbalanced)
Time variable: time, 0 to 14, but with gaps
Delta: 1 unit
. xtreg ln_wage educ pexp pexp2 broken_home , re
Random-effects GLS regression Number of obs = 17,919
Group variable: id Number of groups = 2,178
R-squared: Obs per group:
Within = 0.2283 min = 1
Between = 0.1489 avg = 8.2
Overall = 0.1714 max = 15
Wald chi2(4) = 5026.37
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0901376 .0034376 26.22 0.000 .0834001 .0968751
pexp | .1027822 .0026045 39.46 0.000 .0976776 .1078868
pexp2 | -.0036384 .0001426 -25.52 0.000 -.0039179 -.0033589
broken_home | -.0646814 .0228582 -2.83 0.005 -.1094827 -.0198801
_cons | .5807788 .0441847 13.14 0.000 .4941784 .6673792
-------------+----------------------------------------------------------------
sigma_u | .37244396
sigma_e | .32867768
rho | .56218095 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. est store bre
.
Or, the Swamy-Aurora version of the random effects model (closest to what R uses):
%%stata
xtreg ln_wage educ pexp pexp2 broken_home , sa th
est store bre_sa
. xtreg ln_wage educ pexp pexp2 broken_home , sa th
Random-effects GLS regression Number of obs = 17,919
Group variable: id Number of groups = 2,178
R-squared: Obs per group:
Within = 0.2283 min = 1
Between = 0.1492 avg = 8.2
Overall = 0.1714 max = 15
Wald chi2(4) = 5004.33
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------- theta --------------------
min 5% median 95% max
0.3201 0.4517 0.6885 0.7595 0.7672
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0904542 .0033583 26.93 0.000 .0838721 .0970363
pexp | .1025766 .0026118 39.27 0.000 .0974575 .1076957
pexp2 | -.0036296 .0001431 -25.37 0.000 -.00391 -.0033492
broken_home | -.0642368 .022005 -2.92 0.004 -.1073659 -.0211077
_cons | .5782614 .0432297 13.38 0.000 .4935328 .66299
-------------+----------------------------------------------------------------
sigma_u | .35445648
sigma_e | .32867768
rho | .53768242 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. est store bre_sa
.
Fixed Effects#
In stata, the easiest model to run is the “between” estimator. Note, stata automatically drops covariates that do not vary within an individual’s observations so that the model runs, but it leaves the variable in the regression results output:
%%stata
xtreg ln_wage educ pexp pexp2 broken_home , fe
est store bfe
. xtreg ln_wage educ pexp pexp2 broken_home , fe
note: broken_home omitted because of collinearity.
Fixed-effects (within) regression Number of obs = 17,919
Group variable: id Number of groups = 2,178
R-squared: Obs per group:
Within = 0.2286 min = 1
Between = 0.1331 avg = 8.2
Overall = 0.1663 max = 15
F(3,15738) = 1554.76
corr(u_i, Xb) = 0.0053 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0761987 .0059414 12.83 0.000 .0645529 .0878444
pexp | .1070331 .0027728 38.60 0.000 .101598 .1124681
pexp2 | -.0038188 .000149 -25.63 0.000 -.0041109 -.0035268
broken_home | 0 (omitted)
_cons | .7679618 .0716835 10.71 0.000 .6274539 .9084696
-------------+----------------------------------------------------------------
sigma_u | .40417965
sigma_e | .32867768
rho | .6019421 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2177, 15738) = 10.41 Prob > F = 0.0000
. est store bfe
.
Hypothesis Testing and Model Comparison#
We have various tests to run:
OLS (Pooled) versus Random Effects#
In stata, right after running the random effects model, run
%%stata
quietly: xtreg ln_wage educ pexp pexp2 broken_home , re
xttest0
. quietly: xtreg ln_wage educ pexp pexp2 broken_home , re
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
ln_wage[id,t] = Xb + u[id] + e[id,t]
Estimated results:
| Var SD = sqrt(Var)
---------+-----------------------------
ln_wage | .2790337 .5282364
e | .108029 .3286777
u | .1387145 .372444
Test: Var(u) = 0
chibar2(01) = 19338.65
Prob > chibar2 = 0.0000
.
OLS (Pooled) versus Fixed Effects#
In stata, this is tested by default and included in the fe
output (at
the end):
------------------------------------------------------------------------------
F test that all u_i=0: F(2177, 15738) = 10.41 Prob > F = 0.0000
Random Effects versus Fixed Effects#
In stata, install xtoverid
and ivreg2
1 and use this after the fixed effects
regression:
%%stata
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re
Sargan-Hansen statistic 31.892 Chi-sq(3) P-value = 0.0000
or, you can use the Hausman test explictly. Let’s run Hausman on the Swamy-Arora version of random effects:
%%stata
hausman bfe bre_sa
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| bfe bre_sa Difference Std. err.
-------------+----------------------------------------------------------------
educ | .0761987 .0904542 -.0142555 .0049012
pexp | .1070331 .1025766 .0044565 .000931
pexp2 | -.0038188 -.0036296 -.0001892 .0000415
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 43.88
Prob > chi2 = 0.0000
Endogeneity and Panel Data#
IV regression on panel data (where x4
is endogenous and instrumented
by z1
) can be performed by
%%stata
xtivreg y x1 x2 x3 (x4=z1), fe
Robust Standard Errors and Panel Data#
Use this stata code:
%%stata
xtreg ln_wage educ pexp pexp2, fe robust
Fixed-effects (within) regression Number of obs = 17,919
Group variable: id Number of groups = 2,178
R-squared: Obs per group:
Within = 0.2286 min = 1
Between = 0.1331 avg = 8.2
Overall = 0.1663 max = 15
F(3,2177) = 535.99
corr(u_i, Xb) = 0.0053 Prob > F = 0.0000
(Std. err. adjusted for 2,178 clusters in id)
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0761987 .0099113 7.69 0.000 .0567621 .0956352
pexp | .1070331 .0042128 25.41 0.000 .0987716 .1152945
pexp2 | -.0038188 .0002193 -17.42 0.000 -.0042489 -.0033888
_cons | .7679618 .1208608 6.35 0.000 .5309472 1.004976
-------------+----------------------------------------------------------------
sigma_u | .40417965
sigma_e | .32867768
rho | .6019421 (fraction of variance due to u_i)
------------------------------------------------------------------------------
- 1
ssc install xtoverid
andssc install ivreg2
.