Panel Data Estimation in R#

This document provides a brief description of how to implement panel data models in both R and Stata.

# allows us to open stata datasets
library(foreign)
library(lmtest)
library(stargazer)
library(plm)

We will persist with Tobias and Koop but this time will use the entire dataset. Load data and summarize:

tk.df = read.dta("http://rlhick.people.wm.edu/econ407/data/tobias_koop.dta")
stargazer(tk.df,header=FALSE,title="Tobias and Koop Summary Statistics",type="text")

Tobias and Koop Summary Statistics
==================================================
Statistic     N      Mean    St. Dev.  Min    Max 
--------------------------------------------------
id          17,919 1,081.852 629.846    1    2,178
educ        17,919  12.676    1.922     9     20  
ln_wage     17,919   2.297    0.528   0.070  4.570
pexp        17,919   8.363    4.128     0     22  
time        17,919   8.197    3.956     0     14  
ability     17,919   0.052    0.926   -4.040 2.010
meduc       17,919  11.472    2.989     0     20  
feduc       17,919  11.709    3.767     0     20  
broken_home 17,919   0.154    0.361     0      1  
siblings    17,919   3.156    2.121     0     18  
pexp2       17,919  86.970    75.263    0     484 
--------------------------------------------------

We will be estimating the returns to education equation and will ignore any endogeneity issues for the purposes of illustration.

Pooled OLS#

In R pooled OLS, this is simply an OLS model and is easily run using

wage.ols = lm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.df)
summary(wage.ols)

Call:
lm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.53137 -0.27441  0.02012  0.31121  2.11371 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.6822885  0.0286842  23.786  < 2e-16 ***
educ         0.0890684  0.0019390  45.934  < 2e-16 ***
pexp         0.0907662  0.0034375  26.405  < 2e-16 ***
pexp2       -0.0030460  0.0001895 -16.071  < 2e-16 ***
broken_home -0.0561547  0.0100131  -5.608 2.08e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4808 on 17914 degrees of freedom
Multiple R-squared:  0.1717,	Adjusted R-squared:  0.1715 
F-statistic: 928.6 on 4 and 17914 DF,  p-value: < 2.2e-16

or we could do it using the plm package after setting up our panel dataset (tk.panel.df):

tk.panel.df = plm.data(tk.df, index = c("id","time"))
wage.pool = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df,model="pooling")
summary(wage.pool)

Warning message:
“use of 'plm.data' is discouraged, better use 'pdata.frame' instead”

Pooling Model

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "pooling")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-2.531374 -0.274411  0.020121  0.311211  2.113707 

Coefficients:
               Estimate  Std. Error  t-value  Pr(>|t|)    
(Intercept)  0.68228847  0.02868415  23.7863 < 2.2e-16 ***
educ         0.08906836  0.00193904  45.9343 < 2.2e-16 ***
pexp         0.09076619  0.00343747  26.4050 < 2.2e-16 ***
pexp2       -0.00304601  0.00018953 -16.0715 < 2.2e-16 ***
broken_home -0.05615469  0.01001310  -5.6081 2.076e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    4999.7
Residual Sum of Squares: 4141.1
R-Squared:      0.17173
Adj. R-Squared: 0.17155
F-statistic: 928.576 on 4 and 17914 DF, p-value: < 2.22e-16

Random Effects#

In R we again use the plm package (note that this implements something closest to the Swamy-Aurora version of the random effects model, which in Stata is xtreg ln_wage educ pexp pexp2 broken_home , sa th):

wage.re = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df,model="random")
summary(wage.re)

Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "random")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Effects:
                 var std.dev share
idiosyncratic 0.1080  0.3287 0.469
individual    0.1223  0.3498 0.531
theta:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3152  0.6847  0.7151  0.7007  0.7478  0.7642 

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.36051 -0.15586  0.03123  0.00648  0.19569  2.21809 

Coefficients:
               Estimate  Std. Error  z-value  Pr(>|z|)    
(Intercept)  0.57764593  0.04297643  13.4410 < 2.2e-16 ***
educ         0.09053539  0.00333714  27.1296 < 2.2e-16 ***
pexp         0.10252019  0.00261397  39.2202 < 2.2e-16 ***
pexp2       -0.00362720  0.00014322 -25.3263 < 2.2e-16 ***
broken_home -0.06411881  0.02178336  -2.9435  0.003245 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2693.3
Residual Sum of Squares: 1960.1
R-Squared:      0.27508
Adj. R-Squared: 0.27492
Chisq: 4998.17 on 4 DF, p-value: < 2.22e-16

Fixed Effects#

In R we again use the plm package (and R plm surpresses the model constant in a way more in line with our class notes). R also automatically drops covariates not varying within a cross sectional unit (although it doesn’t tell you explicitly the way Stata does).

wage.fe = plm(ln_wage~educ+pexp+pexp2+broken_home, data=tk.panel.df, model="within")
summary(wage.fe)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ educ + pexp + pexp2 + broken_home, data = tk.panel.df, 
    model = "within")

Unbalanced Panel: n = 2178, T = 1-15, N = 17919

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-2.452999 -0.145082  0.010925  0.167903  2.543757 

Coefficients:
         Estimate  Std. Error t-value  Pr(>|t|)    
educ   0.07619867  0.00594136  12.825 < 2.2e-16 ***
pexp   0.10703306  0.00277281  38.601 < 2.2e-16 ***
pexp2 -0.00381882  0.00014899 -25.632 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2204
Residual Sum of Squares: 1700.2
R-Squared:      0.22862
Adj. R-Squared: 0.12176
F-statistic: 1554.76 on 3 and 15738 DF, p-value: < 2.22e-16

ECON407 Cross Section Econometrics

Panel Data Estimation in R

Contents

Panel Data Estimation in R#

Pooled OLS#

Random Effects#

Fixed Effects#

Hypothesis Testing and Model Comparison#

OLS (Pooled) versus Random Effects#

OLS (Pooled) versus Fixed Effects#

Random Effects versus Fixed Effects#

Endogeneity#

Robust Standard Errors#