Treatment-effects estimation using lasso

New In

Treatment-effects estimation using lasso

You use treatment-effects estimators to draw causal inferences from observational data. Perhaps you want to estimate the effect of a drug regimen on blood pressure, the effect of a surgical procedure on mobility, the effect of a training program on employment, or the effect of an ad campaign on sales.

You use lasso inferential estimators when you are interested in inference on a few covariates while controlling for many other potential covariates. (And when we say many, we mean hundreds, thousands, or more!)

You can now use these estimators simultaneously. With the new telasso command, you can estimate treatment effects while controlling for many potential covariates.

For example, you can type

. telasso (y1 x1-x100) (treat w1-w100)

to estimate the effect of the binary treatment treat on the continuous outcome y1 while controlling for predictors x1 through x100 in the outcome model and for w1 through w100 in the treatment model. The obtained estimates benefit from robustness properties of both the treatment-effects estimators and lasso.

With telasso, you get everything you expect from treatment effects and from lasso. You can estimate the average treatment effect, the average treatment effect on the treated, and the potential-outcome means. You can model continuous, binary, and count outcomes and choose between a logit or probit treatment model. And for selection of controls, you can choose between lasso or square-root lasso estimation and choose from several selection methods, such as BIC and cross-validation.

Highlights

Estimate treatment effects with high-dimensional controls

High-dimensional controls in the outcome model
High-dimensional controls in the treatment model

Flexible model specification

Outcome model can be linear, logit, probit, or poisson
Treatment assignment model can be logit or probit

Different measures of treatment effects

ATE: average treatment effects
Variance–covariance matrix of random effects
ATET: average treatment effect on the treated
POM: potential-outcome mean

Robust estimation

Double robustness: only one of the models needs to be correctly specified
Neyman orthogonality: guard against model-selection mistakes made by lasso

Double machine learning

Cross-fitting and resampling

Let’s see it work

We would like to compare two types of lung transplants: bilateral lung transplant (BLT) and single lung transplant (SLT). BLT is usually associated with a higher death rate in the short term after the operation but with a more significant improvement in the quality of life than SLT. As a result, for patients who need to decide between these two treatment options, knowing the effect of BLT (versus SLT) on life quality is essential. Therefore, we want to estimate the effect of the treatment transtype on the outcome fev1p. This outcome represents the percentage of forced expiratory volume in one second (FEV1) that the patient has relative to a healthy person.

Our data include 29 variables recording characteristics of the patients and donors. We use these variables and the interactions between them as controls in our model. It would be tedious to type these variable names one by one to distinguish between continuous and categorical variables. vl is a suite of commands that simplifies this process.

The following code creates the control variable list and stores it in the global macro $allvars.

. quietly vl set

. vl create cvars = vlcontinuous - (fev1p)
note: $cvars initialized with 12 variables.

. vl create fvars = vlcategorical - (transtype)
note: $fvars initialized with 17 variables.

. vl sub allvars = c.cvars i.fvars c.cvars#i.fvars

Now we are ready to use telasso to estimate the average treatment effects. We assume a linear outcome model and a logit treatment model, the defaults. We type

. telasso (fev1p $allvars) (transtype $allvars)

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATE ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATE           
   transtype  
       (BLT   
         vs   
       SLT)      37.51841   .1606703   233.51   0.000     37.20351    37.83332

POmean        
   transtype  
        SLT       46.4938   .2021582   229.99   0.000     46.09757    46.89002

If all the patients were to choose a BLT, the FEV1% is expected to be 38 percentage points higher than the average of 46% expected if all patients were to choose an SLT. Among the 454 control variables, telasso selects only 8 of them.

It is common to estimate the average treatment effect to determine the effect on those who actually received the treatment. To estimate this value, we add the atet option.

. telasso (fev1p $allvars) (transtype $allvars), atet

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATET ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATET          
   transtype  
       (BLT   
         vs   
       SLT)      35.78157   .1831478   195.37   0.000     35.42261    36.14053

POmean        
   transtype  
        SLT      43.35214   1.268976    34.16   0.000     40.86499    45.83929

For the patients who have a BLT, we expect the average FEV1% to be 36 percentage points higher than if all of them choose an SLT.

The estimates that we obtained above relied on a key assumption of lasso, the sparsity assumption, which requires that only a small number of the potential covariates are in the “true” model. We can use a double machine learning technique to allow for more covariates in the true model. To do this, we add the xfold(5) option to split the sample into five groups and perform cross-fitting and add the resample(3) option to repeat the cross-fitting procedure with three samples.

To guarantee that we can later reproduce the estimation results, we also set the random-number seed. We type

. set seed 12345671

. telasso (fev1p $allvars) (transtype $allvars), xfolds(5) resample(3) nolog

Treatment-effects lasso estimation    Number of observations       =       937
                                      Number of controls           =       454
                                      Number of selected controls  =        16
Outcome model:   linear               Number of folds in cross-fit =         5
Treatment model: logit                Number of resamples          =         3



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATE           
   transtype  
       (BLT   
         vs   
       SLT)      37.52837   .1683194   222.96   0.000     37.19847    37.85827

POmean        
   transtype  
        SLT       46.4941   .2040454   227.86   0.000     46.09418    46.89402

The estimated treatment effect is very similar to the one reported by the first telasso command, but the selected model included 16 controls instead of 8. The similarity of the estimates across the different specifications suggests that our first model did not suffer from a violation of the sparsity assumption.

View all the new features

Order Stata 17

Upgrade