New In

Alias variables across frames

Highlights

  • Easy read-only access to variables in linked frames

  • Save memory with repeated values in huge datasets

Stata supports multiple datasets in memory; each dataset resides in a frame. In Stata 18, you can now work with variables from different frames as if they exist in one.

When datasets are related, you can link their frames by using the frlink command to identify the variables that match the observations in the current frame with observations in the related frame.

Alias variables, created by the new fralias add command, define references to variables in linked frames. These variables take up very little memory because the observations are actually stored in another frame.

Stata treats alias variables like any other variable in your dataset, with the exception that you are not allowed to change their values. For a given alias variable, if you change the corresponding variable’s values in the linked frame, the changed values are automatically available the next time you use the alias variable.

Let’s see it work

We have two files, persons.dta and txcounty.dta, that are related. persons.dta contains data on individuals living in Texas, and txcounty.dta contains data on counties in Texas. Variable countyid identifies counties in Texas in both datasets.

In the following, we load the datasets into separate frames— the person data into the current frame and the Texas counties data into a new frame named txcounty.

These two frames each contain a variable named countyid that identifies Texas counties. We assume the coded value for a given Texas county is the same between these two frames.

We use frlink with variable countyid to link the observations in the current frame with observations in frame txcounty.

. frlink m:1 countyid, frame(txcounty)
  (all observations in frame default matched)

frlink creates a new variable that maps the observations in the current frame to those in the linked frame. In this example, the new variable is named after the linked frame, txcounty, but you can specify a different name by using option generate().

We can create alias variables one at a time or in groups. We decide to use *, a common shortcut that matches one or more characters in variable names, to create an alias variable for each variable in the linked frame. There is only one variable (other than countyid) in the linked frame, so we get one new alias variable in the current frame.

.fralias add *, from(txcounty)
  (variable not aliased from linked frame: countyid)
  (1 variable aliased from linked frame)

We type fralias describe to see a description of all the alias variables in the current frame.

fralias describe found our new alias variable and reports that its name is median_income; its target variable is a float variable also named median_income that resides in the frame named txcounty and is linked to the current frame via the variable txcounty.

Recall from the above calls to describe that the current frame contains 20 observations and the linked frame txcounty contains eight observations. The memory footprint for alias variable median_income consists of two variable characteristics used to store the name of the link variable and the name of the target variable in the linked frame. If, instead of creating an alias for median_income, we used frget to create a copy of median_income in the current frame, the memory footprint for the new float variable is four bytes for every observation in the current frame. For a new double variable, it’s eight bytes for every observation. Alias variables created by fralias add have a small fixed memory footprint compared with the variables created by frget.

We can now use alias variable median_income like any other variable, provided we do not try to change its values. Let’s summarize its values.

Suppose we need to change some values of median_income. We make changes to median_income in the frame txcounty. Those changes are automatically available to the alias variable in the current frame.