A Stata Primer#

Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we’ll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. R, Python, or Matlab, etc.) you are free to do so, but I might no be able to support any technical issues you run into.

Loading data into Stata#

First, we will initiate Stata in our jupyter notebook using,

# start a connected stata17 session
from pystata import config
config.init('be')
config.set_streaming_output_mode('off')

Loading stata datasets#

Stata can load comma-delimited (csv), excel (xls), and stata (dta) files out of the box. It can also load data from the web:

%%stata
use "https://rlhick.people.wm.edu/econ407/data/mroz"
sum
. use "https://rlhick.people.wm.edu/econ407/data/mroz"

. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         lfp |        753    .5683931    .4956295          0          1
        whrs |        753    740.5764    871.3142          0       4950
         kl6 |        753    .2377158     .523959          0          3
        k618 |        753    1.353254    1.319874          0          8
          wa |        753    42.53785    8.072574         30         60
-------------+---------------------------------------------------------
          we |        753    12.28685    2.280246          5         17
          ww |        753    2.374565    3.241829          0         25
        rpwg |        753    1.849734    2.419887          0       9.98
        hhrs |        753    2267.271    595.5666        175       5010
          ha |        753    45.12085    8.058793         30         60
-------------+---------------------------------------------------------
          he |        753    12.49137    3.020804          3         17
          hw |        753    7.482179    4.230559      .4121     40.509
      faminc |        753    23080.59     12190.2       1500      96000
         mtr |        753    .6788632    .0834955      .4415      .9415
        wmed |        753    9.250996    3.367468          0         17
-------------+---------------------------------------------------------
        wfed |        753    8.808765     3.57229          0         17
          un |        753    8.623506    3.114934          3         14
         cit |        753    .6427623    .4795042          0          1
          ax |        753    10.63081     8.06913          0         45

. 

Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the command

%%stata
use "/some/place/mroz.dta"

Viewing Data#

If you are using the graphical version of Stata viewing data is easy and I can show you how to do that. Viewing data in your jupyter notebook can be done by listing data at the command line is achieved by the list command, and might be useful for your problem sets for showing a few lines of data. Here we’ll view the first 5 rows of data:

%%stata
list in 1/5
     +--------------------------------------------------------------------+
  1. | lfp | whrs | kl6 | k618 | wa | we |     ww | rpwg | hhrs | ha | he |
     |   1 | 1610 |   1 |    0 | 32 | 12 |  3.354 | 2.65 | 2708 | 34 | 12 |
     |--------------------------------------------------------------------|
     |     hw  | faminc  |   mtr  | wmed  |  wfed  |   un  |  cit  |  ax  |
     | 4.0288  |  16310  | .7215  |   12  |     7  |    5  |    0  |  14  |
     +--------------------------------------------------------------------+

     +--------------------------------------------------------------------+
  2. | lfp | whrs | kl6 | k618 | wa | we |     ww | rpwg | hhrs | ha | he |
     |   1 | 1656 |   0 |    2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 |  9 |
     |--------------------------------------------------------------------|
     |     hw  | faminc  |   mtr  | wmed  |  wfed  |   un  |  cit  |  ax  |
     | 8.4416  |  21800  | .6615  |    7  |     7  |   11  |    1  |   5  |
     +--------------------------------------------------------------------+

     +--------------------------------------------------------------------+
  3. | lfp | whrs | kl6 | k618 | wa | we |     ww | rpwg | hhrs | ha | he |
     |   1 | 1980 |   1 |    3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 |
     |--------------------------------------------------------------------|
     |     hw  | faminc  |   mtr  | wmed  |  wfed  |   un  |  cit  |  ax  |
     | 3.5807  |  21040  | .6915  |   12  |     7  |    5  |    0  |  15  |
     +--------------------------------------------------------------------+

     +--------------------------------------------------------------------+
  4. | lfp | whrs | kl6 | k618 | wa | we |     ww | rpwg | hhrs | ha | he |
     |   1 |  456 |   0 |    3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 |
     |--------------------------------------------------------------------|
     |     hw  | faminc  |   mtr  | wmed  |  wfed  |   un  |  cit  |  ax  |
     | 3.5417  |   7300  | .7815  |    7  |     7  |    5  |    0  |   6  |
     +--------------------------------------------------------------------+

     +--------------------------------------------------------------------+
  5. | lfp | whrs | kl6 | k618 | wa | we |     ww | rpwg | hhrs | ha | he |
     |   1 | 1568 |   1 |    2 | 31 | 14 | 4.5918 |  3.6 | 2000 | 32 | 12 |
     |--------------------------------------------------------------------|
     |     hw  | faminc  |   mtr  | wmed  |  wfed  |   un  |  cit  |  ax  |
     |     10  |  27300  | .6215  |   12  |    14  |  9.5  |    1  |   7  |
     +--------------------------------------------------------------------+

You can combine list with logical expressions for showing rows meeting logical conditions and only view selected columns. Let’s look at the first 3 rows where the respondent has kids less than 6 years old, focusing on whrs, wa, and kl6:

%%stata
list whrs wa kl6 if kl6>0 in 1/3 
     +-----------------+
     | whrs   wa   kl6 |
     |-----------------|
  1. | 1610   32     1 |
  3. | 1980   35     1 |
     +-----------------+

Creating and Modifying Variables#

Creating Variables#

In stata, you need to start a new variable with gen (shorthand for generate).

%%stata
gen newvar = lfp * ax

Modifying Variables#

To modify an existing variable, use replace

Unlike stata we simply redefine the variable and don’t need to bother with replace:

%%stata
replace newvar = newvar/10
(423 real changes made)

Here is an example that creates a new dummy variable.

%%stata
gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
. gen haskids = 0

. replace haskids = (kl6>0) | (k618>0)
(524 real changes made)

. list haskids kl6 k618 in 1/10

     +----------------------+
     | haskids   kl6   k618 |
     |----------------------|
  1. |       1     1      0 |
  2. |       1     0      2 |
  3. |       1     1      3 |
  4. |       1     0      3 |
  5. |       1     1      2 |
     |----------------------|
  6. |       0     0      0 |
  7. |       1     0      2 |
  8. |       0     0      0 |
  9. |       1     0      2 |
 10. |       1     0      2 |
     +----------------------+

. 

Creating dummy variables#

While the above example shows how to make “manually” use logical checks to create dummy variables, a better way (particularly if you need to create many categories) is tab. Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use

tab x, gen(dum_x)

Starting Over#

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the clear command

Do Files#

Do files allow you to put all of the relevant stata commands (note: not results, narrative, or math) for a project into one file, so that results can be easily replicated from one stata settion to the next. If you are using jupyter notebook which contains the full record of your work, do files are probably not necessary for this class.

However, the use of do files is highly recommended if you are using the default Stata program and interface, rather than using jupyter notebook.

Log Files#

Since we are using the jupyter notebook interface for Stata, the record of our work is fully contained in the notebook, and log files are probably not necessary.

However, if you are running Stata using do files (see Do Files, a very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue

log using "/some/place/my_first.log", replace txt

will create (or if it exists, will replace) the file my~first~.log in the folder /some/place. If you don’t won’t to replace your existing work, use this command instead

log using "/some/place/my_first.log", append txt

and all of your results will be appended to the log file. When you are finished for a stata session, issue the command log close to close the file and save all changes. You may then open it using the text editor of your choosing.

Getting help in Stata#

If you need to find general help in stata, type help command where command is some stata command. You can also do keyword searches: search keyword. To see the same set of results in a better help viewer, type view search keyword for example view search reg.