Introduction to Stata
Contents
Introduction to Stata#
Here I briefly introduce the use of Stata for this course.
Loading and Summarizing Stata Datasets#
Stata can load comma-delimited (csv
), excel (xls
), and stata (dta
)
files out of the box. It can also load data from the web:
use "https://rlhick.people.wm.edu/econ407/data/mroz"
sum
. use "https://rlhick.people.wm.edu/econ407/data/mroz"
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
lfp | 753 .5683931 .4956295 0 1
whrs | 753 740.5764 871.3142 0 4950
kl6 | 753 .2377158 .523959 0 3
k618 | 753 1.353254 1.319874 0 8
wa | 753 42.53785 8.072574 30 60
-------------+---------------------------------------------------------
we | 753 12.28685 2.280246 5 17
ww | 753 2.374565 3.241829 0 25
rpwg | 753 1.849734 2.419887 0 9.98
hhrs | 753 2267.271 595.5666 175 5010
ha | 753 45.12085 8.058793 30 60
-------------+---------------------------------------------------------
he | 753 12.49137 3.020804 3 17
hw | 753 7.482179 4.230559 .4121 40.509
faminc | 753 23080.59 12190.2 1500 96000
mtr | 753 .6788632 .0834955 .4415 .9415
wmed | 753 9.250996 3.367468 0 17
-------------+---------------------------------------------------------
wfed | 753 8.808765 3.57229 0 17
un | 753 8.623506 3.114934 3 14
cit | 753 .6427623 .4795042 0 1
ax | 753 10.63081 8.06913 0 45
.
Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the command
use "/some/place/mroz.dta"
Viewing Data#
If you are using the graphical version of Stata viewing data is easy
and I can show you how to do that. Viewing data can be done using the %head
magic.
%head
lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | hw | faminc | mtr | wmed | wfed | un | cit | ax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 |
2 | 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | 9 | 8.4416 | 21800 | .6615 | 7 | 7 | 11 | 1 | 5 |
3 | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 |
4 | 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 | 3.5417 | 7300 | .7815 | 7 | 7 | 5 | 0 | 6 |
5 | 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.6 | 2000 | 32 | 12 | 10 | 27300 | .6215 | 12 | 14 | 9.5 | 1 | 7 |
You can combine logical expressions as well (and display more than 5 lines of data):
%head 10 if lfp==1
lfp | whrs | kl6 | k618 | wa | we | ww | rpwg | hhrs | ha | he | hw | faminc | mtr | wmed | wfed | un | cit | ax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1610 | 1 | 0 | 32 | 12 | 3.354 | 2.65 | 2708 | 34 | 12 | 4.0288 | 16310 | .7215 | 12 | 7 | 5 | 0 | 14 |
2 | 1 | 1656 | 0 | 2 | 30 | 12 | 1.3889 | 2.65 | 2310 | 30 | 9 | 8.4416 | 21800 | .6615 | 7 | 7 | 11 | 1 | 5 |
3 | 1 | 1980 | 1 | 3 | 35 | 12 | 4.5455 | 4.04 | 3072 | 40 | 12 | 3.5807 | 21040 | .6915 | 12 | 7 | 5 | 0 | 15 |
4 | 1 | 456 | 0 | 3 | 34 | 12 | 1.0965 | 3.25 | 1920 | 53 | 10 | 3.5417 | 7300 | .7815 | 7 | 7 | 5 | 0 | 6 |
5 | 1 | 1568 | 1 | 2 | 31 | 14 | 4.5918 | 3.6 | 2000 | 32 | 12 | 10 | 27300 | .6215 | 12 | 14 | 9.5 | 1 | 7 |
6 | 1 | 2032 | 0 | 0 | 54 | 12 | 4.7421 | 4.7 | 1040 | 57 | 11 | 6.7106 | 19495 | .6915 | 14 | 7 | 7.5 | 1 | 33 |
7 | 1 | 1440 | 0 | 2 | 37 | 16 | 8.3333 | 5.95 | 2670 | 37 | 12 | 3.4277 | 21152 | .6915 | 14 | 7 | 5 | 0 | 11 |
8 | 1 | 1020 | 0 | 0 | 54 | 12 | 7.8431 | 9.98 | 4120 | 53 | 8 | 2.5485 | 18900 | .6915 | 3 | 3 | 5 | 0 | 35 |
9 | 1 | 1458 | 0 | 2 | 48 | 12 | 2.1262 | 0 | 1995 | 52 | 4 | 4.2206 | 20405 | .7515 | 7 | 7 | 3 | 0 | 24 |
10 | 1 | 1600 | 0 | 2 | 39 | 12 | 4.6875 | 4.15 | 2100 | 43 | 12 | 5.7143 | 20425 | .6915 | 7 | 7 | 5 | 0 | 21 |
You can also list data using the list
command. To view the first 5 rows of
data:
list whrs wa kl6 if kl6>0 in 1/3
+-----------------+
| whrs wa kl6 |
|-----------------|
1. | 1610 32 1 |
3. | 1980 35 1 |
+-----------------+
Creating and Modifying Variables#
Creating Variables#
In stata
, you need to start a new variable with gen
(shorthand for generate).
gen newvar = lfp * ax
Modifying Variables#
To modify an existing variable, use replace
Unlike stata
we simply redefine the variable and don’t need to bother
with replace
:
replace newvar = newvar/10
(423 real changes made)
Here is an example that creates a new dummy variable.
gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
. gen haskids = 0
. replace haskids = (kl6>0) | (k618>0)
(524 real changes made)
. list haskids kl6 k618 in 1/10
+----------------------+
| haskids kl6 k618 |
|----------------------|
1. | 1 1 0 |
2. | 1 0 2 |
3. | 1 1 3 |
4. | 1 0 3 |
5. | 1 1 2 |
|----------------------|
6. | 0 0 0 |
7. | 1 0 2 |
8. | 0 0 0 |
9. | 1 0 2 |
10. | 1 0 2 |
+----------------------+
.
Creating dummy variables#
While the above example shows how to make “manually” use logical
checks to create dummy variables, a better way (particularly if you need
to create many categories) is tab
. Suppose a variable x
takes on the
values 1,2, or 3. To create categorical (dummy) variables for each
value, use
tab x, gen(dum_x)
Starting Over#
Sometimes, you want to get rid of all the variables for a new analysis,
or simply to start over. To do this, use the clear
command
Do Files#
Do files allow you to put all of the relevant stata commands (note:
not results, narrative, or math) for a project into one file, so that
results can be easily replicated from one stata settion to the
next. If you are using jupyter notebook
which contains the full
record of your work, do
files are probably not necessary for this
class.
However, the use of do
files is highly recommended if you are using
the default Stata program and interface, rather than using jupyter notebook
.
Log Files#
Since we are using the jupyter notebook
interface for Stata, the
record of our work is fully contained in the notebook, and log files
are probably not necessary.
However, if you are running Stata using do
files (see
Do Files, a very useful way to save your results is have stata
automatically put everything in a log file. To initialize a log file
and use it, issue
log using "/some/place/my_first.log", replace txt
will create (or if it exists, will replace) the file my~first~.log in the folder /some/place. If you don’t won’t to replace your existing work, use this command instead
log using "/some/place/my_first.log", append txt
and all of your results will be appended to the log file. When you are
finished for a stata session, issue the command log close
to close the
file and save all changes. You may then open it using the text editor of
your choosing.
Getting help in Stata#
If you need to find general help in stata, type help command
where
command is some stata command. You can also do keyword searches:
search keyword
. To see the same set of results in a better help
viewer, type view search keyword
for example view search reg
.