Maximum Likelihood Estimation in R#

This chapter shows how to setup a generic log-likelihood function in R and use that to estimate an econometric model. For ECON407, the models we will be investigating use maximum likelihood estimation and pre-existing log-likelihood definitions to estimate the model. Therefore, consider this chapter as NOT REQUIRED optional material.

The maximum likelihood method seeks to find model parameters \(\pmb{\theta}^{MLE}\) that maximizes the joint likelihood (the likelihood function- \(L\)) of observing \(\mathbf{y}\) conditional on some structural model that is a function of independent variables \(\mathbf{x}\) and model parameters \(\pmb{\theta}^{MLE}\):

\[ \pmb{\theta}^{MLE} = \underset{\pmb{\theta}}{max} L = \underset{\pmb{\theta}}{\operatorname{max}} \prod_{i=1}^N prob(y_i|\mathbf{x}_i,\pmb{\theta}) \]

In a simple ordinary least squares setting with a linear model, we have the model parameters \(\pmb{\theta}=[\pmb{\beta},\sigma]\) making the likelihood function

\[ prob(y_i|\mathbf{x}_i,\pmb{\beta},\sigma) = \frac{1}{\sqrt{2\sigma^2 \pi}}e^{\frac{-(y_i - \mathbf{x}_i\pmb{\beta})^2}{2\sigma^2}} \]

Typically, for computational purposes we work with the log-likelihood function rather than the the likelihood. This is

\[ log(L) = \sum_{i=1}^N log(prob(y_i|\mathbf{x}_i,\pmb{\theta})) \]

Load Data and run simple OLS regression Model#

# allows us to open stata datasets
library(foreign)
library(maxLik)
library(stargazer)

Loading our data:

tk.df = read.dta("http://rlhick.people.wm.edu/econ407/data/tobias_koop.dta")
tk4.df = subset(tk.df, time == 4)
tk4.df$ones=1

Linear Estimation of the OLS Model#

summary(lm(lm(ln_wage~educ + pexp + pexp2 + broken_home, data=tk4.df)))
Call:
lm(formula = lm(ln_wage ~ educ + pexp + pexp2 + broken_home, 
    data = tk4.df))

Residuals:
     Min       1Q   Median       3Q      Max 
-1.69490 -0.24818 -0.00079  0.25417  1.51139 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.460333   0.137294   3.353 0.000829 ***
educ         0.085273   0.009290   9.179  < 2e-16 ***
pexp         0.203521   0.023586   8.629  < 2e-16 ***
pexp2       -0.012413   0.002283  -5.438 6.73e-08 ***
broken_home -0.008725   0.035711  -0.244 0.807020    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4265 on 1029 degrees of freedom
Multiple R-squared:  0.1664,	Adjusted R-squared:  0.1632 
F-statistic: 51.36 on 4 and 1029 DF,  p-value: < 2.2e-16

Implementing Maximum Likelihood Estimation in R#

Here we define the likelihood function for an OLS model. Note the first step is to determine if the during optimization the numerical solver could allow the model variance \(\sigma^2<0\). If so, the model returns an NA (not a number), and the optimization function enters another iteration for looking for model parameters.

y = tk4.df$ln_wage
X = data.matrix(tk4.df[,c('educ', 'pexp', 'pexp2', 'broken_home', 'ones')], 
                rownames.force = NA)
ols.lnlike <- function(theta) {
  if (theta[1] <= 0) return(NA)
  -sum(dnorm(y, mean = X %*% theta[-1], sd = sqrt(theta[1]), log = TRUE))
}

Note, that we can evaluate the log-likelihood for any parameter vector:

print(ols.lnlike(c(1,1,1,1,1,1)))
[1] 1305304

Manual Optimization (the hard way)#

While this method is the hard way, it demonstrates the mechanics of optimization in R. In order to find the best or optimal parameters, we use R’s optim function to find the estimated parameter vector. It needs as arguments

  1. starting values

  2. optimization method (for unconstrained problems BFGS is a good choice)

  3. function to be minimized

  4. any data that the function needs for evaluation.

Also, we can ask optim to return the hessian, which we’ll use for calculating standard errors.

mle_res.optim = optim(c(1,.1,.1,.1,.1,.1), method="BFGS",fn=ols.lnlike,hessian=TRUE)
summary(mle_res.optim)
            Length Class  Mode   
par          6     -none- numeric
value        1     -none- numeric
counts       2     -none- numeric
convergence  1     -none- numeric
message      0     -none- NULL   
hessian     36     -none- numeric

Our model parameters are

b = round(mle_res.optim$par,5)
print(b)
[1]  0.18106  0.08527  0.20352 -0.01241 -0.00873  0.46033

and the standard errors are

se = round(sqrt(diag(solve(mle_res.optim$hessian))),5)
print(se)
[1] 0.00796 0.00927 0.02353 0.00228 0.03562 0.13696

Putting it together for comparison to stata:

labels = c('Sigma2','educ','pexp','pexp2','broken_home','Constant')
tabledat = data.frame(matrix(c(labels,b,se),nrow=length(b)))
names(tabledat) = c("Variable","Parameter","Standard Error")
stargazer(tabledat,summary=FALSE,title="Maximum Likelihood Results from R",
          type="text",rownames = FALSE)
Maximum Likelihood Results from R
====================================
Variable    Parameter Standard Error
------------------------------------
Sigma2       0.18106     0.00796    
educ         0.08527     0.00927    
pexp         0.20352     0.02353    
pexp2       -0.01241     0.00228    
broken_home -0.00873     0.03562    
Constant     0.46033     0.13696    
------------------------------------

Using maxLik (the easy way)#

Unlike optim above, the maxLik package needs the log-likelihood value not -1*log-likelihood. So let’s redefine the likelihood:

ols.lnlike <- function(theta) {
  if (theta[1] <= 0) return(NA)
  sum(dnorm(y, mean = X %*% theta[-1], sd = sqrt(theta[1]), log = TRUE))
}

Using the maxLik package facilitates this process alot.

# starting values with names (a named vector)
start = c(1,.1,.1,.1,.1,.1)
names(start) = c("sigma2","educ","pexp","pexp2","broken_home","Constant")
mle_res <- maxLik( logLik = ols.lnlike, start = start, method="BFGS" )
summary(mle_res)
--------------------------------------------
Maximum Likelihood estimation
BFGS maximization, 114 iterations
Return code 0: successful convergence 
Log-Likelihood: -583.6629 
6  free parameters
Estimates:
             Estimate Std. error t value  Pr(> t)    
sigma2       0.181058   0.007963  22.737  < 2e-16 ***
educ         0.085273   0.009262   9.207  < 2e-16 ***
pexp         0.203521   0.023527   8.651  < 2e-16 ***
pexp2       -0.012413   0.002277  -5.451    5e-08 ***
broken_home -0.008725   0.035626  -0.245 0.806525    
Constant     0.460333   0.136879   3.363 0.000771 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------

My personal opinion is that if you do a lot of this type of work (writing custom log-likelihoods), you are better off using Matlab, R, or Python than Stata. If you use R my strong preference is to use maxLik as I demonstrate above over mle or mle2, since the latter two require that you define your likelihood function as having separate arguments for each parameter to be optimized over.

Visualizing the Likelihood Function#

Let’s graph likelihood for the educ coefficient, holding

  1. all data at observed values

  2. other parameters constant at optimal values

I am going to call this a “marginal log-likelihood” since we are only allowing one parameter to vary in this plot (rather than all 6). Here is what it looks like:

coeff2 = mle_res.optim$par
altcoeff = seq(.06,.11,0.001)
LL = NULL

# calculates the LL for various values of the education parameter
# and stores them in LL:
for (m in altcoeff) {
  coeff2[2] = m # consider a candidate parameter value for education
  LLt = ols.lnlike(coeff2)
  LL = rbind(LL,LLt)
}


plot(altcoeff,LL, type="l", xlab="Education Coefficient", ylab="LogL",
     main="Marginal Log-Likelihood Function for \n Education Coefficient")

abline(v=mle_res$estimate[2], col="red")
abline(h=mle_res$maximum, col="red")
../_images/R_mle_23_0.png

This shows that the parameter value of .085 maximizes the log-likelihood function.