{ "cells": [ { "cell_type": "markdown", "id": "56279d8b", "metadata": {}, "source": [ "# Stata, Jupyter, and Reproducible Research\n", "\n", "From [Wikipedia](https://en.wikipedia.org/wiki/Reproducibility#Reproducible_research), reproducible research is defined as:\n", "\n", ">The term reproducible research refers to the idea that the ultimate product of academic research is the paper **along with** the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.\n", "\n", "The reproducible research movement (especially for the statistical sciences)\n", "takes this a step further by advocating for dynamic documents. The idea is\n", "that a researcher should provide a file (the dynamic document) that can execute\n", "the statistical analysis, generate figures, and contains accompanying text\n", "narrative. This file can be executed to produce the **academic paper**. The\n", "researcher shares this file with other researchers rather than the only the\n", "paper. It is my view that within 20 years nearly every scientific journal in\n", "applied statistics will require this approach.\n", "\n", "This document shows how to use [jupyter](https://jupyter.org/)\n", "`notebook` or `lab` and markdown syntax for reproducible research and\n", "dynamic documents for work in stata. The idea behind `jupyter` is\n", "that you share your research by sharing your `ipynb` notebook file.\n", "This file performs the full suite of statistical analysis and can\n", "produce the pdf manuscript describing your analysis. You will use\n", "this workflow for all class problem set assignments.\n", "\n", "For every problem set, you will turn in the jupyter notebook `ipynb`\n", "(similar to a do file) file containing all commands, descriptive text,\n", "and embedded handwritten responses that produces your problem set.\n", "\n", "## Getting Started\n", "\n", "You will be using\n", "[https://jupyterhub.wm.edu](https://jupyterhub.wm.edu). When you log\n", "in initially, choose either the `R` or `Stata` option.\n", "\n", "## Some Features of Markdown in Jupyter\n", "\n", "Jupyter allows for most features of\n", "[Markdown](https://daringfireball.net/projects/markdown/syntax), which\n", "is a liteweight and readable **text-based** language that allows files\n", "to be easily converted to nice looking pdf, html, or even word\n", "documents. Some features you will likely want to use:\n", "\n", "* Equations and Math Notation using latex math\n", "* Headers\n", "* Emphasizing text (bold and italics)\n", "* Numeric and bulletted lists\n", "* Turning stata output on and off\n", "* Adding page breaks for `pdf` output using `\\pagebreak` in a markdown\n", " code cell\n", "\n", "## A simple example analysis using Markdown syntax\n", "\n", "Below we'll be modeling the following regression equation for cars\n", "back in the day:\n", "\n", "$$\n", "price_i = \\beta_0 + \\beta_1 mpg_i + \\beta_2 foreign_i + \\epsilon_i\n", "$$\n", "\n", "### Load Data and Summarize\n", "\n", "Load and summarize a dataset:\n", "\n", "```\n", "sysuse auto\n", "sum\n", "```\n", "\n", "At this point you are free to execute stata commands interactively in\n", "your notebook. If you encounter any problems, open an issue at the\n", "[issue-tracker](https://code.wm.edu/econ/407/issue-tracker).\n", "\n", "## Loading and Summarizing Data\n", "\n", "Summarizing the data shows the variables we can consider in our\n", "analysis using a `Stata` code cell:\n", "\n", "```\n", ". sysuse auto\n", "(1978 automobile data)\n", "\n", ". sum\n", "\n", " Variable | Obs Mean Std. dev. Min Max\n", "-------------+---------------------------------------------------------\n", " make | 0\n", " price | 74 6165.257 2949.496 3291 15906\n", " mpg | 74 21.2973 5.785503 12 41\n", " rep78 | 69 3.405797 .9899323 1 5\n", " headroom | 74 2.993243 .8459948 1.5 5\n", "-------------+---------------------------------------------------------\n", " trunk | 74 13.75676 4.277404 5 23\n", " weight | 74 3019.459 777.1936 1760 4840\n", " length | 74 187.9324 22.26634 142 233\n", " turn | 74 39.64865 4.399354 31 51\n", "displacement | 74 197.2973 91.83722 79 425\n", "-------------+---------------------------------------------------------\n", " gear_ratio | 74 3.014865 .4562871 2.19 3.89\n", " foreign | 74 .2972973 .4601885 0 1 \n", "```\n", "\n", "We might also want to look at histograms of our dependent variable, `price`:\n", "\n", "```\n", "hist price\n", "```\n", "\n", "```{figure} /_static/reproducible_research_5_1.svg\n", ":scale: 80%\n", ":name: name\n", "```\n", "\n", "
\n", "\n", "## Regression Model\n", "\n", "Here are the regression results:\n", "\n", "```\n", "reg price mpg foreign\n", "```\n", "\n", "```\n", " Source | SS df MS Number of obs = 74\n", "-------------+---------------------------------- F(2, 71) = 14.07\n", " Model | 180261702 2 90130850.8 Prob > F = 0.0000\n", " Residual | 454803695 71 6405685.84 R-squared = 0.2838\n", "-------------+---------------------------------- Adj R-squared = 0.2637\n", " Total | 635065396 73 8699525.97 Root MSE = 2530.9\n", "\n", "------------------------------------------------------------------------------\n", " price | Coefficient Std. err. t P>|t| [95% conf. interval]\n", "-------------+----------------------------------------------------------------\n", " mpg | -294.1955 55.69172 -5.28 0.000 -405.2417 -183.1494\n", " foreign | 1767.292 700.158 2.52 0.014 371.2169 3163.368\n", " _cons | 11905.42 1158.634 10.28 0.000 9595.164 14215.67\n", "------------------------------------------------------------------------------\n", "```\n", "\n", "## Discussion of Results\n", "\n", "We can now proceed to describe our results and add narrative to the\n", "document: Looks like back in the day, foreign cars sell for more!\n", "\n", "## Jupyter and Mata\n", "\n", "Mata is the matrix algebra environment in stata. It operates exactly as a Stata code block by wrapping code with `mata` and `end`:\n", "\n", "Define $\\mathbf{A}_{2 \\times 2}$ as\n", "\n", "$$\n", "\\mathbf{A}=\\begin{bmatrix} 1 & 2 \\\\\n", " 3 & 4 \\end{bmatrix}\n", "$$\n", "\n", "\n", "```\n", "mata\n", "A = (1,2\\3,4)\n", "A\n", "end\n", "```\n", "\n", "```\n", ". mata\n", "------------------------------------------------- mata (type end to exit) -----\n", ": \n", ": A = (1,2\\3,4)\n", "\n", ": A\n", " 1 2\n", " +---------+\n", " 1 | 1 2 |\n", " 2 | 3 4 |\n", " +---------+\n", "\n", ": end\n", "-------------------------------------------------------------------------------\n", "\n", ". \n", "```\n", "\n", "## Producing pdf's from your notebook\n", "\n", "It is possible to export your notebook in a variety of formats\n", "including pdf. To do this click on the download link in the top\n", "corner of this page and choose `pdf`. To create a `pdf` from your\n", "notebook, click on `File` -> `Export Notebook as ...` and choose\n", "`pdf`. This may require additional configuration steps and are not\n", "required for this course.\n", "\n", "## A reproducible version of this notebook\n", "\n", "Due to some technical issues related to Stata not being open source\n", "and available when producing this website, you need to use [this ipynb\n", "notebook](https://econ.pages.code.wm.edu/407/syllabus/_static/reproducible_research.ipynb)[^2]\n", "if you want to run this document on a campus lab computer to fully\n", "replicate these results.\n", " \n", "[^1]: Rather than the download link at the top of this webpage." ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.10.3" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.17" }, "source_map": [ 12 ] }, "nbformat": 4, "nbformat_minor": 5 }