A Stata Primer
Contents
A Stata Primer#
Here I briefly introduce the use of matrix algebra manipulations and maximum likelihood programming in Stata. Other software packages are arguably more adept for these tasks, but in this class we’ll focus on stata as the tool for all of our work. If you prefer to do you work in other mathematical packages (e.g. R, Python, or Matlab, etc.) you are free to do so, but I might no be able to support any technical issues you run into.
Loading data into Stata#
First, we will initiate Stata in our jupyter notebook using,
# start a connected stata17 session
from pystata import config
config.init('be')
config.set_streaming_output_mode('off')
Loading stata datasets#
Stata can load comma-delimited (csv
), excel (xls
), and stata (dta
)
files out of the box. It can also load data from the web:
%%stata
use "https://rlhick.people.wm.edu/econ407/data/mroz"
sum
. use "https://rlhick.people.wm.edu/econ407/data/mroz"
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
lfp | 753 .5683931 .4956295 0 1
whrs | 753 740.5764 871.3142 0 4950
kl6 | 753 .2377158 .523959 0 3
k618 | 753 1.353254 1.319874 0 8
wa | 753 42.53785 8.072574 30 60
-------------+---------------------------------------------------------
we | 753 12.28685 2.280246 5 17
ww | 753 2.374565 3.241829 0 25
rpwg | 753 1.849734 2.419887 0 9.98
hhrs | 753 2267.271 595.5666 175 5010
ha | 753 45.12085 8.058793 30 60
-------------+---------------------------------------------------------
he | 753 12.49137 3.020804 3 17
hw | 753 7.482179 4.230559 .4121 40.509
faminc | 753 23080.59 12190.2 1500 96000
mtr | 753 .6788632 .0834955 .4415 .9415
wmed | 753 9.250996 3.367468 0 17
-------------+---------------------------------------------------------
wfed | 753 8.808765 3.57229 0 17
un | 753 8.623506 3.114934 3 14
cit | 753 .6427623 .4795042 0 1
ax | 753 10.63081 8.06913 0 45
.
Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the command
%%stata
use "/some/place/mroz.dta"
Viewing Data#
If you are using the graphical version of Stata viewing data is easy
and I can show you how to do that. Viewing data in your jupyter
notebook can be done by listing data at the command line is achieved
by the list
command, and might be useful for your problem sets for
showing a few lines of data. Here we’ll view the first 5 rows of
data:
%%stata
list in 1/5
+--------------------------------------------------------------------+
1. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he |
| 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 |
|--------------------------------------------------------------------|
| hw | faminc | mtr | wmed | wfed | un | cit | ax |
| 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
2. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he |
| 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | 9 |
|--------------------------------------------------------------------|
| hw | faminc | mtr | wmed | wfed | un | cit | ax |
| 8.4416 | 21800 | .6615 | 7 | 7 | 11 | 1 | 5 |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
3. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he |
| 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 |
|--------------------------------------------------------------------|
| hw | faminc | mtr | wmed | wfed | un | cit | ax |
| 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
4. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he |
| 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 |
|--------------------------------------------------------------------|
| hw | faminc | mtr | wmed | wfed | un | cit | ax |
| 3.5417 | 7300 | .7815 | 7 | 7 | 5 | 0 | 6 |
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
5. | lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he |
| 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.6 | 2000 | 32 | 12 |
|--------------------------------------------------------------------|
| hw | faminc | mtr | wmed | wfed | un | cit | ax |
| 10 | 27300 | .6215 | 12 | 14 | 9.5 | 1 | 7 |
+--------------------------------------------------------------------+
You can combine list
with logical expressions for showing rows
meeting logical conditions and only view selected columns. Let’s look
at the first 3 rows where the respondent has kids less than 6 years
old, focusing on whrs
, wa
, and kl6
:
%%stata
list whrs wa kl6 if kl6>0 in 1/3
+-----------------+
| whrs wa kl6 |
|-----------------|
1. | 1610 32 1 |
3. | 1980 35 1 |
+-----------------+
Creating and Modifying Variables#
Creating Variables#
In stata
, you need to start a new variable with gen
(shorthand for generate).
%%stata
gen newvar = lfp * ax
Modifying Variables#
To modify an existing variable, use replace
Unlike stata
we simply redefine the variable and don’t need to bother
with replace
:
%%stata
replace newvar = newvar/10
(423 real changes made)
Here is an example that creates a new dummy variable.
%%stata
gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
. gen haskids = 0
. replace haskids = (kl6>0) | (k618>0)
(524 real changes made)
. list haskids kl6 k618 in 1/10
+----------------------+
| haskids kl6 k618 |
|----------------------|
1. | 1 1 0 |
2. | 1 0 2 |
3. | 1 1 3 |
4. | 1 0 3 |
5. | 1 1 2 |
|----------------------|
6. | 0 0 0 |
7. | 1 0 2 |
8. | 0 0 0 |
9. | 1 0 2 |
10. | 1 0 2 |
+----------------------+
.
Creating dummy variables#
While the above example shows how to make “manually” use logical
checks to create dummy variables, a better way (particularly if you need
to create many categories) is tab
. Suppose a variable x
takes on the
values 1,2, or 3. To create categorical (dummy) variables for each
value, use
tab x, gen(dum_x)
Starting Over#
Sometimes, you want to get rid of all the variables for a new analysis,
or simply to start over. To do this, use the clear
command
Do Files#
Do files allow you to put all of the relevant stata commands (note:
not results, narrative, or math) for a project into one file, so that
results can be easily replicated from one stata settion to the
next. If you are using jupyter notebook
which contains the full
record of your work, do
files are probably not necessary for this
class.
However, the use of do
files is highly recommended if you are using
the default Stata program and interface, rather than using jupyter notebook
.
Log Files#
Since we are using the jupyter notebook
interface for Stata, the
record of our work is fully contained in the notebook, and log files
are probably not necessary.
However, if you are running Stata using do
files (see
Do Files, a very useful way to save your results is have stata
automatically put everything in a log file. To initialize a log file
and use it, issue
log using "/some/place/my_first.log", replace txt
will create (or if it exists, will replace) the file my~first~.log in the folder /some/place. If you don’t won’t to replace your existing work, use this command instead
log using "/some/place/my_first.log", append txt
and all of your results will be appended to the log file. When you are
finished for a stata session, issue the command log close
to close the
file and save all changes. You may then open it using the text editor of
your choosing.
Getting help in Stata#
If you need to find general help in stata, type help command
where
command is some stata command. You can also do keyword searches:
search keyword
. To see the same set of results in a better help
viewer, type view search keyword
for example view search reg
.