Introduction to Stata#

Here I briefly introduce the use of Stata for this course.

Loading and Summarizing Stata Datasets#

Stata can load comma-delimited (csv), excel (xls), and stata (dta) files out of the box. It can also load data from the web:

use "https://rlhick.people.wm.edu/econ407/data/mroz"
sum
. use "https://rlhick.people.wm.edu/econ407/data/mroz"

. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         lfp |        753    .5683931    .4956295          0          1
        whrs |        753    740.5764    871.3142          0       4950
         kl6 |        753    .2377158     .523959          0          3
        k618 |        753    1.353254    1.319874          0          8
          wa |        753    42.53785    8.072574         30         60
-------------+---------------------------------------------------------
          we |        753    12.28685    2.280246          5         17
          ww |        753    2.374565    3.241829          0         25
        rpwg |        753    1.849734    2.419887          0       9.98
        hhrs |        753    2267.271    595.5666        175       5010
          ha |        753    45.12085    8.058793         30         60
-------------+---------------------------------------------------------
          he |        753    12.49137    3.020804          3         17
          hw |        753    7.482179    4.230559      .4121     40.509
      faminc |        753    23080.59     12190.2       1500      96000
         mtr |        753    .6788632    .0834955      .4415      .9415
        wmed |        753    9.250996    3.367468          0         17
-------------+---------------------------------------------------------
        wfed |        753    8.808765     3.57229          0         17
          un |        753    8.623506    3.114934          3         14
         cit |        753    .6427623    .4795042          0          1
          ax |        753    10.63081     8.06913          0         45

. 

Loading files from disk is a slight variation the above command. Supposing that your stata data file mroz.dta was in the folder /some/place, in Linux or MacOS we would use the command

use "/some/place/mroz.dta"

Viewing Data#

If you are using the graphical version of Stata viewing data is easy and I can show you how to do that. Viewing data can be done using the %head magic.

%head
lfp whrs kl6 k618 wa we ww rpwg hhrs ha he hw faminc mtr wmed wfed un cit ax
1 1 1610 1 0 32 12 3.354 2.65 2708 34 12 4.0288 16310 .7215 12 7 5 0 14
2 1 1656 0 2 30 12 1.3889 2.65 2310 30 9 8.4416 21800 .6615 7 7 11 1 5
3 1 1980 1 3 35 12 4.5455 4.04 3072 40 12 3.5807 21040 .6915 12 7 5 0 15
4 1 456 0 3 34 12 1.0965 3.25 1920 53 10 3.5417 7300 .7815 7 7 5 0 6
5 1 1568 1 2 31 14 4.5918 3.6 2000 32 12 10 27300 .6215 12 14 9.5 1 7

You can combine logical expressions as well (and display more than 5 lines of data):

%head 10 if lfp==1
lfp whrs kl6 k618 wa we ww rpwg hhrs ha he hw faminc mtr wmed wfed un cit ax
1 1 1610 1 0 32 12 3.354 2.65 2708 34 12 4.0288 16310 .7215 12 7 5 0 14
2 1 1656 0 2 30 12 1.3889 2.65 2310 30 9 8.4416 21800 .6615 7 7 11 1 5
3 1 1980 1 3 35 12 4.5455 4.04 3072 40 12 3.5807 21040 .6915 12 7 5 0 15
4 1 456 0 3 34 12 1.0965 3.25 1920 53 10 3.5417 7300 .7815 7 7 5 0 6
5 1 1568 1 2 31 14 4.5918 3.6 2000 32 12 10 27300 .6215 12 14 9.5 1 7
6 1 2032 0 0 54 12 4.7421 4.7 1040 57 11 6.7106 19495 .6915 14 7 7.5 1 33
7 1 1440 0 2 37 16 8.3333 5.95 2670 37 12 3.4277 21152 .6915 14 7 5 0 11
8 1 1020 0 0 54 12 7.8431 9.98 4120 53 8 2.5485 18900 .6915 3 3 5 0 35
9 1 1458 0 2 48 12 2.1262 0 1995 52 4 4.2206 20405 .7515 7 7 3 0 24
10 1 1600 0 2 39 12 4.6875 4.15 2100 43 12 5.7143 20425 .6915 7 7 5 0 21

You can also list data using the list command. To view the first 5 rows of data:

list whrs wa kl6 if kl6>0 in 1/3 
     +-----------------+
     | whrs   wa   kl6 |
     |-----------------|
  1. | 1610   32     1 |
  3. | 1980   35     1 |
     +-----------------+

Creating and Modifying Variables#

Creating Variables#

In stata, you need to start a new variable with gen (shorthand for generate).

gen newvar = lfp * ax

Modifying Variables#

To modify an existing variable, use replace

Unlike stata we simply redefine the variable and don’t need to bother with replace:

replace newvar = newvar/10
(423 real changes made)

Here is an example that creates a new dummy variable.

gen haskids = 0
replace haskids = (kl6>0) | (k618>0)
list haskids kl6 k618 in 1/10
. gen haskids = 0

. replace haskids = (kl6>0) | (k618>0)
(524 real changes made)

. list haskids kl6 k618 in 1/10

     +----------------------+
     | haskids   kl6   k618 |
     |----------------------|
  1. |       1     1      0 |
  2. |       1     0      2 |
  3. |       1     1      3 |
  4. |       1     0      3 |
  5. |       1     1      2 |
     |----------------------|
  6. |       0     0      0 |
  7. |       1     0      2 |
  8. |       0     0      0 |
  9. |       1     0      2 |
 10. |       1     0      2 |
     +----------------------+

. 

Creating dummy variables#

While the above example shows how to make “manually” use logical checks to create dummy variables, a better way (particularly if you need to create many categories) is tab. Suppose a variable x takes on the values 1,2, or 3. To create categorical (dummy) variables for each value, use

tab x, gen(dum_x)

Starting Over#

Sometimes, you want to get rid of all the variables for a new analysis, or simply to start over. To do this, use the clear command

Do Files#

Do files allow you to put all of the relevant stata commands (note: not results, narrative, or math) for a project into one file, so that results can be easily replicated from one stata settion to the next. If you are using jupyter notebook which contains the full record of your work, do files are probably not necessary for this class.

However, the use of do files is highly recommended if you are using the default Stata program and interface, rather than using jupyter notebook.

Log Files#

Since we are using the jupyter notebook interface for Stata, the record of our work is fully contained in the notebook, and log files are probably not necessary.

However, if you are running Stata using do files (see Do Files, a very useful way to save your results is have stata automatically put everything in a log file. To initialize a log file and use it, issue

log using "/some/place/my_first.log", replace txt

will create (or if it exists, will replace) the file my~first~.log in the folder /some/place. If you don’t won’t to replace your existing work, use this command instead

log using "/some/place/my_first.log", append txt

and all of your results will be appended to the log file. When you are finished for a stata session, issue the command log close to close the file and save all changes. You may then open it using the text editor of your choosing.

Getting help in Stata#

If you need to find general help in stata, type help command where command is some stata command. You can also do keyword searches: search keyword. To see the same set of results in a better help viewer, type view search keyword for example view search reg.