New In

Bayesian model averaging (BMA) for linear regression

Why choose just one model when you can borrow information from many? The new bma suite performs Bayesian model averaging to account for model uncertainty in your analysis. Uncertain which predictors to include in your linear regression model? Use bmaregress to find out which predictors are important. Perform model choice, inference, and prediction. Use many postestimation commands to explore influential models, model complexity, model fit and predictive performance, sensitivity analysis to the assumptions about importance of models and predictors, and more.

Highlights

  • Model choice, inference, and prediction

  • Model enumeration and MC3 sampling

  • Always-included and grouped predictors

  • Model priors: uniform, binomial, beta-binomial

  • Many g-priors, including hyper-g and robust

  • Factor variables and time-series operators

  • Strong, weak, or no heredity for interactions

  • BMA MCMC convergence

  • Posterior model and inclusion probabilities

  • Prior and posterior model-size distributions

  • Posterior distributions of model parameters

  • Jointness measures for pairs of predictors

  • Variable-inclusion maps

  • Log predictive-score for model fit and prediction performance

  • Sensitivity analysis

  • Prediction

  • Standard Bayesian postestimation features support

  • See more Bayesian model averaging features

Introduction

Traditionally, we select a model and perform inference and prediction conditional on this model. Our results typically do not account for the uncertainty in selecting a model and thus may be overly optimistic. They may even be incorrect if our selected model is substantially different from the true data-generating model (DGM). In some applications, we may have strong theoretical or empirical evidence about the DGM. In other applications, usually of complex and unstable natures, such as those in economics, psychology, and epidemiology, choosing one reliable model can be difficult. Instead of relying on just one model, model averaging averages results over multiple plausible models based on the observed data. In BMA, the “plausibility” of the model is described by the posterior model probability (PMP), which is determined using the fundamental Bayesian principles—the Bayes theorem—and applied universally to all data analyses. BMA can be used to account for model uncertainty when estimating model parameters and predicting new observations to avoid overly optimistic conclusions. It is particularly useful in applications with several plausible models, where there is no one definitive reason to choose a particular model over the others. But even if choosing a single model is the end goal, you can find BMA beneficial. It provides a principled way to identify important models and predictors within the considered classes of models. Its framework allows you to learn about interrelations between different predictors in terms of their tendency to appear in a model together, separately, or independently. It can be used to evaluate the sensitivity of the final results to various assumptions about the importance of different models and predictors. And it provides optimal predictions in the log-score sense.

The bma suite

In a regression setting, model uncertainty amounts to the uncertainty of which predictors should be included in a model. We can use bmaregress to account for selection of predictors in a linear regression. It explores the model space either exhaustively with the enumeration option, whenever feasible, or by using the Markov chain Monte Carlo (MCMC) model composition (MC3) algorithm with the sampling option. It reports various summaries of the visited models and included predictors and of posterior distributions of model parameters. It allows you to specify groups of predictors to be included or excluded together from a model and those that are included in all models. It provides various prior distributions for models in the mprior() option and for the g parameter, which controls shrinkage of regression coefficients toward zero, in the gprior() option. It also supports factor variables and time-series operators and provides several ways for handling interactions during estimation by using the heredity() option.

There are many supported postestimation features, which also include some of the standard Bayesian postestimation features.

Command Description

bmacoefsample

Posterior samples of regression coefficients

bmagraph pmp

model-probability plots

bmagraph msize

model-size distribution plots

bmagraph varmap

variable-inclusion maps

bmagraph coefdensity

coefficient posterior density plots

bmastats models

posterior model and variable-inclusion summaries

bmastats msize

model-size summary

bmastats pip

posterior inclusion probabilities (PIPs) for predictors

bmastats jointness

jointness measures for predictors

bmastats lps

log predictive-score (LPS)

bmapredict

BMA predictions

bayesgraph

Bayesian graphical summaries and convergence diagnostics

bayesstats summary

Bayesian summary statistics for model parameters and their functions

bayesstats ess

Bayesian effective sample sizes and related statistics

bayesstats ppvalues

Bayesian predictive p-values

bayespredict

Bayesian predictions

Below, we provide a tour of some bma features using a toy dataset. This dataset contains 

=200  observations,  =10  orthogonal predictors, and outcome y generated as

=0.5+1.2×2+5×10+  

where 

(0,1)  is a standard normal error term.
. webuse bmaintro
(Simulated data for BMA example)

. summarize
Variable Obs Mean Std. dev. Min Max
y 200 .9944997 4.925052 -13.332 13.06587
x1 200 -.0187403 .9908957 -3.217909 2.606215
x2 200 -.0159491 1.098724 -2.999594 2.566395
x3 200 .080607 1.007036 -3.016552 3.020441
x4 200 .0324701 1.004683 -2.410378 2.391406
x5 200 -.0821737 .9866885 -2.543018 2.133524
x6 200 .0232265 1.006167 -2.567606 3.840835
x7 200 -.1121034 .9450883 -3.213471 1.885638
x8 200 -.0668903 .9713769 -2.871328 2.808912
x9 200 -.1629013 .9550258 -2.647837 2.472586
x10 200 .083902 .8905923 -2.660675 2.275681
Model enumeration

We use bmaregress to fit a BMA linear regression of y on x1 through x10.

. bmaregress y x1-x10

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                            No. of obs         =    200
Linear regression                                   No. of predictors  =     10
Model enumeration                                               Groups =     10
                                                                Always =      0
Priors:                                             No. of models      =  1,024
  Models: Beta-binomial(1, 1)                           For CPMP >= .9 =      9
   Cons.: Noninformative                            Mean model size    =  2.479
   Coef.: Zellner's g
       g: Benchmark, g = 200                        Shrinkage, g/(1+g) = 0.9950
  sigma2: Noninformative                            Mean sigma2        =  1.272
y Mean Std. dev. Group PIP
x2 1.198105 .0733478 2 1
x10 5.08343 .0900953 10 1
x3 -.0352493 .0773309 3 .21123
x9 .004321 .0265725 9 .051516
x1 .0033937 .0232163 1 .046909
x4 -.0020407 .0188504 4 .039267
x5 .0005972 .0152443 5 .033015
x9 -.0005639 .0153214 8 .032742
x7 -8.23e-06 .015497 7 .032386
x6 -.0003648 .0143983 6 .032361
Always
_cons .5907923 .0804774 0 1
Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default priors are used for models and parameter g.

With 10 predictors and the default fixed (benchmark) g-prior, bmaregress uses model enumeration and explores all 

210=1,024  possible models. The default model prior is beta-binomial with shape parameters of 1, which is uniform over the model size. A priori, our BMA model assumed little shrinkage of regression coefficients toward zero because  /(1+)=0.9950  is close to 1.

bmaregress identified x2 and x10 as highly important predictors—their posterior inclusion probabilities (PIPs) are essentially 1. The posterior mean estimates of their regression coefficients are, respectively, 1.2 (rounded) and 5.1 and are very close to the true values of 1.2 and 5. All other predictors have much lower PIPs and coefficient estimates close to zero. Our BMA findings are consistent with the true DGM.

Let’s store our current estimation results for later use. As with other Bayesian commands, we must save the simulation results first before we can use estimates store to store the estimation results.

. bmaregress, saving(bmareg)
note: file bmareg.dta saved.

. estimates store bmareg
Credible intervals

bmaregress does not report credible intervals (CrIs) by default, because it requires a potentially time-consuming simulation of the posterior sample of model parameters. But we can compute them after estimation.

We first use bmacoefsample to generate an MCMC sample from the posterior distribution of model parameters, which include regression coefficients. We then use the existing bayesstats summary command to compute posterior summaries of the generated MCMC sample.

. bmacoefsample, rseed(18)

Simulation (10000): ....5000....10000 done

. bayesstats summary

Posterior summary statistics                      MCMC sample size =    10,000
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
x1 .0031927 .0234241 .000234 0 0 .0605767
x2 1.197746 .0726358 .000735 1.197211 1.053622 1.341076
x3 -.0361581 .0780037 .00078 0 -.2604694 0
x4 -.0021015 .018556 .000186 0 -.0306376 0
x5 .0004701 .0147757 .000148 0 0 0
x6 -.0003859 .0140439 .000142 0 0 0
x7 -.0003311 .0166303 .000166 0 0 0
x8 -.0005519 .0145717 .00015 0 0 0
x9 .0046535 .0273899 .000274 0 0 .096085
x10 5.08357 .0907759 .000927 5.083466 4.90354 5.262716
_cons .5901334 .0811697 .000801 .5905113 .4302853 .7505722
sigma2 1.272579 .1300217 .0013 1.262612 1.043772 1.555978
g 200 0 0 200 200 200

The 95% CrI for the coefficient of x2 is [1.05, 1.34], and the one for x10 is [4.9, 5.26], which is consistent with our DGM.

Influential models

The BMA coefficient estimates are averaged over 1,024 models. It is important to investigate which models are influential. We can use bmastats models to explore the PMPs.

. bmastats models

Model summary         Number of models:
                               Visited = 1,024
                              Reported =     5

Analytical PMP Model size .6292 2 .1444 3 .0258 3 .0246 3 .01996 3
Rank 1 2 3 4 5
 
Variable-inclusion summary
Rank Rank Rank Rank Rank
1 2 3 4 5
x2 x x x x x
x10 x x x x x
x3 x
x9 x
x1 x
x4 x
Legend: 
x - estimated

Not surprisingly, the model that contains both x2 and x10 has the highest PMP of about 63%. In fact, the top two models correspond to a cumulative PMP of about 77%:

. bmastats models, cumulative(0.75)

Computing model probabilities ...

Model summary           Number of models:
                                 Visited = 1,024
                                Reported =     2
Analytical CPMP Model size
Rank
1 .6292 2
2 .7736 3
Variable-inclusion summary
Rank Rank
1 2
x2 x x
x10 x x
x3 x
Legend: 
x - estimated

We can use bmagraph pmp to plot the cumulative PMP.

. bmagraph pmp, cumulative

The command plots the first 100 models by default, but you can specify the top() option if you would like to see more models.

Important predictors

We can explore the importance of predictors visually by using bmagraph varmap to produce a variable-inclusion map.

. bmagraph varmap

Computing model probabilities ...

x2 and x10 are included in all top 100 models, ranked by PMP. Their coefficients are positive in all models.

Model-size distribution

We can explore the complexity of our BMA model by using bmastats msize and bmagraph msize to explore the prior and posterior model-size distributions.

. bmastats msize

Model-size summary

Number of models = 1,024
Model size:
  Minimum =  0
  Maximum = 10
Mean Median
Prior
Analytical 5.0000 5
Posterior
Analytical 2.4794 2
Note: Frequency summaries not available.
. bmagraph msize

The prior model-size distribution is uniform over the model size. The posterior model-size distribution is skewed to the left with the mode of about 2. The prior mean model size is 5 and the posterior one is 2.48. So, based on the observed data, BMA tends to favor smaller models with roughly two predictors, on average, compared with our a priori assumption.

Posterior distributions of coefficients

We can use bmagraph coefdensity to plot posterior distributions of regression coefficients.

. bmagraph coefdensity {x2} {x3}, combine

These distributions are mixtures of a point mass at zero corresponding to the predictor’s probability of noninclusion and a continuous density conditional on the inclusion. For the coefficient of x2, the probability of noninclusion is negligible, so its posterior distribution is essentially a continuous, fairly symmetric density centered at 1.2. For the coefficient of x3, in addition to the conditional continuous density, there is a red vertical line at zero corresponding to the posterior noninclusion probability of roughly 1-.2 = 0.8.

BMA prediction

bmapredict produces various BMA predictions. For instance, let’s compute the posterior predictive means.

. bmapredict pmean, mean
note: computing analytical posterior predictive means.

We can also compute predictive CrIs to estimate uncertainty about our predictions. (This is not available with many traditional model-selection methods.) Note that these predictive CrIs also incorporate model uncertainty. To compute predictive CrIs, we must save the posterior MCMC model-parameter sample first.

. bmacoefsample, saving(bmacoef)
note: saving existing MCMC simulation results without resampling; specify
      option simulate to force resampling in this case.
note: file bmacoef.dta saved.

. bmapredict cri_l cri_u, cri rseed(18)
note: computing credible intervals using simulation.

Computing predictions ...

We summarize the predicted results:

.  summarize y pmean cri*
Variable Obs Mean Std. dev. Min Max
y 200 .9944997 4.925052 -13.332 13.06587
pmean 200 .9944997 4.783067 -13.37242 12.31697
cri_l 200 -1.24788 4.787499 -15.66658 10.03054
cri_u 200 3.227426 4.779761 -11.06823 14.58301

The reported averages over observations for predictive means and lower and upper 95% predictive CrI bounds look reasonable relative to the observed outcome y.

Informative model priors

One of the strengths of BMA is the ability to incorporate prior information about models and predictors. This allows us to investigate the sensitivity of the results to various assumptions about the importance of models and predictors. Suppose that we have reliable information that all predictors except x2 and x10 are less likely to be related to the outcome. We can specify, for example, a binomial model prior with low prior inclusion probabilities for these predictors.

To compare the out-of-sample performance of the BMA models later, we randomly split our sample into two parts and fit the BMA model using the first subsample. We also save the results from this more informative BMA model.

. splitsample, generate(sample) nsplit(2) rseed(18)

. bmaregress y x1-x10 if sample == 1, mprior(binomial x2 x10 0.5 x1 x3-x9 0.05)
  saving(bmareg_inf)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                          No. of obs         =    100
Linear regression                                 No. of predictors  =     10
Model enumeration                                             Groups =     10
                                                              Always =      0
Priors:                                           No. of models      =  1,024
  Models: Binomial, IP varies                         For CPMP >= .9 =      1
   Cons.: Noninformative                          Mean model size    =  2.072
   Coef.: Zellner's g
       g: Benchmark, g = 100                      Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                          Mean sigma2        =  1.268
y Mean Std. dev. Group PIP
x2 1.168763 .1031096 2 1
x10 4.920726 .124615 10 1
x1 .0019244 .0204242 1 .013449
x5 -.0018262 .0210557 5 .011973
x3 -.0017381 .0205733 3 .011557
x4 -.0015444 .0193858 4 .010709
Always
_cons .5637673 .113255 0 1
Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default prior is used for parameter g.
Note: 4 predictors with PIP less than .01 not shown.

file bmareg_inf.dta saved.

. estimates store bmareg_inf

The effect of this model prior is that the PIP of all unimportant predictors is now less than 2%.

Note that when we select one model, we are essentially fitting a BMA model with a very strong prior that all selected predictors must be included with a probability of 1. For instance, we can force bmaregress to include all variables in the model:

. bmaregress y (x1-x10, always) if sample == 1, saving(bmareg_all)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                           No. of obs         =    100
Linear regression                                  No. of predictors  =     10
Model enumeration                                              Groups =      0
                                                               Always =     10
Priors:                                            No. of models      =      1
  Models: Beta-binomial(1, 1)                          For CPMP >= .9 =      1
   Cons.: Noninformative                           Mean model size    = 10.000
   Coef.: Zellner's g
       g: Benchmark, g = 100                       Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                           Mean sigma2        =  1.192


Mean Std. dev. Group PIP

.1294521 .105395 0 1

1.166679 .1129949 0 1

-.1433074 .1271903 0 1

-.1032189 .1223152 0 1

-.0819008 .1261309 0 1

.0696633 .1057512 0 1

.0222949 .1215404 0 1

-.0252311 .1124352 0 1

.0587412 .1166921 0 1

4.949992 .1276795 0 1

.6006153 .1127032 0 1
y Always x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 _cons
Note: Coefficient posterior means and std. dev. estimated from 1 model.
Note: Default priors are used for models and parameter g.

file bmareg_all.dta saved.

. estimates store bmareg_all
Predictive performance using LPS

For comparison of the considered BMA models, we refit our default BMA model using the first subsample and store the estimation results.

. qui bmaregress y x1-x10 if sample == 1, saving(bmareg, replace)
. estimates store bmareg

We can compare the out-of-sample predictive performance of our BMA models by using bmastats lps to compute the log predictive-score (LPS) for out-of-sample observations.

. bmastats lps bmareg bmareg_inf bmareg_all if sample == 2, compact

Log predictive-score (LPS)

Number of observations = 100
LPS Mean Minimum Maximum
bmareg 1.562967 1.039682 6.778834
bmareg_inf 1.562238 1.037576 6.883794
bmareg_all 1.576231 1.032793 6.084414
Notes: Using analytical PMPs.
       Result bmareg_inf has the smallest mean LPS.

The more informative bmareg_inf model has a slightly smaller mean LPS, but the LPS summaries for all models are very similar. See [BMA] bmastats lps for how to compare the BMA model performance by using cross-validation.

Cleanup

We generated several datasets during our analysis. We no longer need them, so we erase them at the end. But you may decide to keep them, especially if they correspond to a potentially time-consuming final analysis.

. erase bmareg.dta
. erase bmacoef.dta
. erase bmareg_inf.dta
. erase bmareg_all.dta