{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "56279d8b",
   "metadata": {},
   "source": [
    "# Stata, Jupyter, and Reproducible Research\n",
    "\n",
    "From [Wikipedia](https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research), reproducible research is defined as:\n",
    "\n",
    ">The term reproducible research refers to the idea that the ultimate product of academic research is the paper **along with** the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.\n",
    "\n",
    "The reproducible research movement (especially for the statistical sciences)\n",
    "takes this a step further by advocating for dynamic documents. The idea is\n",
    "that a researcher should provide a file (the dynamic document) that can execute\n",
    "the statistical analysis, generate figures, and contains accompanying text\n",
    "narrative.  This file can be executed to produce the **academic paper**.  The\n",
    "researcher shares this file with other researchers rather than the only the\n",
    "paper.  It is my view that within 20 years nearly every scientific journal in\n",
    "applied statistics will require this approach.\n",
    "\n",
    "This document shows how to use [jupyter](https://jupyter.org/)\n",
    "`notebook` or `lab` and markdown syntax for reproducible research and\n",
    "dynamic documents for work in stata.  The idea behind `jupyter` is\n",
    "that you share your research by sharing your `ipynb` notebook file.\n",
    "This file performs the full suite of statistical analysis and can\n",
    "produce the pdf manuscript describing your analysis.  You will use\n",
    "this workflow for all class problem set assignments.\n",
    "\n",
    "For every problem set, you will turn in the jupyter notebook `ipynb`\n",
    "(similar to a do file) file containing all commands, descriptive text,\n",
    "and embedded handwritten responses that produces your problem set.\n",
    "\n",
    "## Getting Started\n",
    "\n",
    "You will be using\n",
    "[https://jupyterhub.wm.edu](https://jupyterhub.wm.edu).  When you log\n",
    "in initially, choose either the `R` or `Stata` option.\n",
    "\n",
    "## Some Features of Markdown in Jupyter\n",
    "\n",
    "Jupyter allows for most features of\n",
    "[Markdown](https://daringfireball.net/projects/markdown/syntax), which\n",
    "is a liteweight and readable **text-based** language that allows files\n",
    "to be easily converted to nice looking pdf, html, or even word\n",
    "documents. Some features you will likely want to use:\n",
    "\n",
    "* Equations and Math Notation using latex math\n",
    "* Headers\n",
    "* Emphasizing text (bold and italics)\n",
    "* Numeric and bulletted lists\n",
    "* Turning stata output on and off\n",
    "* Adding page breaks for `pdf` output using `\\pagebreak` in a markdown\n",
    "  code cell\n",
    "\n",
    "## A simple example analysis using Markdown syntax\n",
    "\n",
    "Below we'll be modeling the following regression equation for cars\n",
    "back in the day:\n",
    "\n",
    "$$\n",
    "price_i = \\beta_0 + \\beta_1 mpg_i + \\beta_2 foreign_i + \\epsilon_i\n",
    "$$\n",
    "\n",
    "### Load Data and Summarize\n",
    "\n",
    "Load and summarize a dataset:\n",
    "\n",
    "```\n",
    "sysuse auto\n",
    "sum\n",
    "```\n",
    "\n",
    "At this point you are free to execute stata commands interactively in\n",
    "your notebook.  If you encounter any problems, open an issue at the\n",
    "[issue-tracker](https://code.wm.edu/econ/407/issue-tracker).\n",
    "\n",
    "## Loading and Summarizing Data\n",
    "\n",
    "Summarizing the data shows the variables we can consider in our\n",
    "analysis using a `Stata` code cell:\n",
    "\n",
    "```\n",
    ". sysuse auto\n",
    "(1978 automobile data)\n",
    "\n",
    ". sum\n",
    "\n",
    "    Variable |        Obs        Mean    Std. dev.       Min        Max\n",
    "-------------+---------------------------------------------------------\n",
    "        make |          0\n",
    "       price |         74    6165.257    2949.496       3291      15906\n",
    "         mpg |         74     21.2973    5.785503         12         41\n",
    "       rep78 |         69    3.405797    .9899323          1          5\n",
    "    headroom |         74    2.993243    .8459948        1.5          5\n",
    "-------------+---------------------------------------------------------\n",
    "       trunk |         74    13.75676    4.277404          5         23\n",
    "      weight |         74    3019.459    777.1936       1760       4840\n",
    "      length |         74    187.9324    22.26634        142        233\n",
    "        turn |         74    39.64865    4.399354         31         51\n",
    "displacement |         74    197.2973    91.83722         79        425\n",
    "-------------+---------------------------------------------------------\n",
    "  gear_ratio |         74    3.014865    .4562871       2.19       3.89\n",
    "     foreign |         74    .2972973    .4601885          0          1 \n",
    "```\n",
    "\n",
    "We might also want to look at histograms of our dependent variable, `price`:\n",
    "\n",
    "```\n",
    "hist price\n",
    "```\n",
    "\n",
    "```{figure} /_static/reproducible_research_5_1.svg\n",
    ":scale: 80%\n",
    ":name: name\n",
    "```\n",
    "\n",
    "<div style=\"page-break-after: always;\"></div>\n",
    "\n",
    "## Regression Model\n",
    "\n",
    "Here are the regression results:\n",
    "\n",
    "```\n",
    "reg price mpg foreign\n",
    "```\n",
    "\n",
    "```\n",
    "      Source |       SS           df       MS      Number of obs   =        74\n",
    "-------------+----------------------------------   F(2, 71)        =     14.07\n",
    "       Model |   180261702         2  90130850.8   Prob > F        =    0.0000\n",
    "    Residual |   454803695        71  6405685.84   R-squared       =    0.2838\n",
    "-------------+----------------------------------   Adj R-squared   =    0.2637\n",
    "       Total |   635065396        73  8699525.97   Root MSE        =    2530.9\n",
    "\n",
    "------------------------------------------------------------------------------\n",
    "       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]\n",
    "-------------+----------------------------------------------------------------\n",
    "         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494\n",
    "     foreign |   1767.292    700.158     2.52   0.014     371.2169    3163.368\n",
    "       _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67\n",
    "------------------------------------------------------------------------------\n",
    "```\n",
    "\n",
    "## Discussion of Results\n",
    "\n",
    "We can now proceed to describe our results and add narrative to the\n",
    "document: Looks like back in the day, foreign cars sell for more!\n",
    "\n",
    "## Jupyter and Mata\n",
    "\n",
    "Mata is the matrix algebra environment in stata.  It operates exactly as a Stata code block by wrapping code with `mata` and `end`:\n",
    "\n",
    "Define $\\mathbf{A}_{2 \\times 2}$ as\n",
    "\n",
    "$$\n",
    "\\mathbf{A}=\\begin{bmatrix} 1 & 2 \\\\\n",
    "                           3 & 4 \\end{bmatrix}\n",
    "$$\n",
    "\n",
    "\n",
    "```\n",
    "mata\n",
    "A = (1,2\\3,4)\n",
    "A\n",
    "end\n",
    "```\n",
    "\n",
    "```\n",
    ". mata\n",
    "------------------------------------------------- mata (type end to exit) -----\n",
    ": \n",
    ": A = (1,2\\3,4)\n",
    "\n",
    ": A\n",
    "       1   2\n",
    "    +---------+\n",
    "  1 |  1   2  |\n",
    "  2 |  3   4  |\n",
    "    +---------+\n",
    "\n",
    ": end\n",
    "-------------------------------------------------------------------------------\n",
    "\n",
    ". \n",
    "```\n",
    "\n",
    "## Producing pdf's from your notebook\n",
    "\n",
    "It is possible to export your notebook in a variety of formats\n",
    "including pdf.  To do this click on the download link in the top\n",
    "corner of this page and choose `pdf`.  To create a `pdf` from your\n",
    "notebook, click on `File` -> `Export Notebook as ...` and choose\n",
    "`pdf`.  This may require additional configuration steps and are not\n",
    "required for this course.\n",
    "\n",
    "## A reproducible version of this notebook\n",
    "\n",
    "Due to some technical issues related to Stata not being open source\n",
    "and available when producing this website, you need to use [this ipynb\n",
    "notebook](https://econ.pages.code.wm.edu/407/syllabus/_static/reproducible_research.ipynb)[^2]\n",
    "if you want to run this document on a campus lab computer to fully\n",
    "replicate these results.\n",
    "   \n",
    "[^1]: Rather than the download link at the top of this webpage."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "text_representation": {
    "extension": ".md",
    "format_name": "myst",
    "format_version": 0.13,
    "jupytext_version": "1.10.3"
   }
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.17"
  },
  "source_map": [
   12
  ]
 },
 "nbformat": 4,
 "nbformat_minor": 5
}