OLS in Stata
Contents
OLS in Stata#
Here we show how to implement many of these ideas in
Stata. Note, this example uses data from a panel dataset (multiple
time periods per individual) and we arbitrarily restrict the analysis
to a cross section dataset by analyzing only records where time
is 4. We do this for demonstration purposes only since we motivate
this model for a pure cross section case. It is almost always a
horrible idea to throw away data like this, and later in the course we
will learn more about panel data techniques.
Running the OLS Model#
Using ln_wage
as our dependent variable, here is the code for
running a simple OLS model in Stata. Since we haven’t loaded the data
yet, let’s do that and run the model:
webuse set "https://rlhick.people.wm.edu/econ407/data/"
webuse tobias_koop
keep if time==4
regress ln_wage educ pexp pexp2 broken_home
. webuse set "https://rlhick.people.wm.edu/econ407/data/"
(prefix now "https://rlhick.people.wm.edu/econ407/data")
. webuse tobias_koop
. keep if time==4
(16,885 observations deleted)
. regress ln_wage educ pexp pexp2 broken_home
Source | SS df MS Number of obs = 1,034
-------------+---------------------------------- F(4, 1029) = 51.36
Model | 37.3778146 4 9.34445366 Prob > F = 0.0000
Residual | 187.21445 1,029 .181938241 R-squared = 0.1664
-------------+---------------------------------- Adj R-squared = 0.1632
Total | 224.592265 1,033 .217417488 Root MSE = .42654
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0852725 .0092897 9.18 0.000 .0670437 .1035014
pexp | .2035214 .0235859 8.63 0.000 .1572395 .2498033
pexp2 | -.0124126 .0022825 -5.44 0.000 -.0168916 -.0079336
broken_home | -.0087254 .0357107 -0.24 0.807 -.0787995 .0613488
_cons | .4603326 .137294 3.35 0.001 .1909243 .7297408
------------------------------------------------------------------------------
.
Robust Standard Errors#
Note that since the \(\mathbf{b}\)’s are unbiased estimators for \(\beta\), those continue to be the ones to report. However, our variance covariance matrix is
Defining \(\mathbf{e=y-xb}\),
In stata, the robust
option is a very easy way to implement the robust
standard error correction. Note: this is not robust regression.
regress ln_wage educ pexp pexp2 broken_home, robust
Linear regression Number of obs = 1,034
F(4, 1029) = 64.82
Prob > F = 0.0000
R-squared = 0.1664
Root MSE = .42654
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0852725 .0099294 8.59 0.000 .0657882 .1047568
pexp | .2035214 .0226835 8.97 0.000 .1590101 .2480326
pexp2 | -.0124126 .0022447 -5.53 0.000 -.0168173 -.0080079
broken_home | -.0087254 .0312381 -0.28 0.780 -.070023 .0525723
_cons | .4603326 .1315416 3.50 0.000 .2022121 .718453
------------------------------------------------------------------------------
The bootstrapping approach to robust standard errors#
Given the advent of very fast computation even with laptop computers, we
can use boostrapping to correct our standard errrors for any non-iid
error process. In Stata
, we just run (for 1000 replicates) using the following code.
I believe this is doing what is called a parametric bootstrap
(confidence intervals assume normality).
regress ln_wage educ pexp pexp2 broken_home, vce(bootstrap,rep(1000))
(running regress on estimation sample)
Bootstrap replications (1,000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
.................................................. 250
.................................................. 300
.................................................. 350
.................................................. 400
.................................................. 450
.................................................. 500
.................................................. 550
.................................................. 600
.................................................. 650
.................................................. 700
.................................................. 750
.................................................. 800
.................................................. 850
.................................................. 900
.................................................. 950
.................................................. 1,000
Linear regression Number of obs = 1,034
Replications = 1,000
Wald chi2(4) = 252.66
Prob > chi2 = 0.0000
R-squared = 0.1664
Adj R-squared = 0.1632
Root MSE = 0.4265
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
ln_wage | coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
educ | .0852725 .0100503 8.48 0.000 .0655743 .1049707
pexp | .2035214 .022688 8.97 0.000 .1590538 .247989
pexp2 | -.0124126 .0022409 -5.54 0.000 -.0168047 -.0080205
broken_home | -.0087254 .0320888 -0.27 0.786 -.0716183 .0541676
_cons | .4603326 .1340845 3.43 0.001 .1975318 .7231333
------------------------------------------------------------------------------
We will return to why this works later in the course.
Testing for Heteroskedasticity#
The two dominant heteroskedasticity tests are the White test and the Cook-Weisberg (both Breusch Pagan tests). The White test is particularly useful if you believe the error variances have a potentially non-linear relationship with model covariates. For stata, we need to make sure the basic OLS regression (rather than robust) is run just prior to these commands and then we can use:
quietly regress ln_wage educ pexp pexp2 broken_home
imtest, white
. quietly regress ln_wage educ pexp pexp2 broken_home
. imtest, white
White's test
H0: Homoskedasticity
Ha: Unrestricted heteroskedasticity
chi2(12) = 29.20
Prob > chi2 = 0.0037
Cameron & Trivedi's decomposition of IM-test
--------------------------------------------------
Source | chi2 df p
---------------------+----------------------------
Heteroskedasticity | 29.20 12 0.0037
Skewness | 14.14 4 0.0068
Kurtosis | 16.19 1 0.0001
---------------------+----------------------------
Total | 59.53 17 0.0000
--------------------------------------------------
.
Stata
’s imtest
allows you to specify more esoteric forms of
heteroskedasticity. A more basic test outlined in Testing for Heteroskedasticity, that assumes
heteroskedastity is a linear function of the \(\mathbf{x}\)’s- is the
Cook-Weisberg Breusch Pagan test. In Stata, we use
hettest
Breusch–Pagan/Cook–Weisberg test for heteroskedasticity
Assumption: Normal error terms
Variable: Fitted values of ln_wage
H0: Constant variance
chi2(1) = 19.57
Prob > chi2 = 0.0000