Bayesian model averaging

New In

Bayesian model averaging (BMA) for linear regression

Why choose just one model when you can borrow information from many? The new bma suite performs Bayesian model averaging to account for model uncertainty in your analysis. Uncertain which predictors to include in your linear regression model? Use bmaregress to find out which predictors are important. Perform model choice, inference, and prediction. Use many postestimation commands to explore influential models, model complexity, model fit and predictive performance, sensitivity analysis to the assumptions about importance of models and predictors, and more.

->Introduction ->The bma suite ->Let’s see it work

Highlights

Model choice, inference, and prediction
Model enumeration and MC3 sampling
Always-included and grouped predictors
Model priors: uniform, binomial, beta-binomial
Many g-priors, including hyper-g and robust
Factor variables and time-series operators
Strong, weak, or no heredity for interactions
BMA MCMC convergence
Posterior model and inclusion probabilities
Prior and posterior model-size distributions
Posterior distributions of model parameters
Jointness measures for pairs of predictors
Variable-inclusion maps
Log predictive-score for model fit and prediction performance
Sensitivity analysis
Prediction
Standard Bayesian postestimation features support
See more Bayesian model averaging features

Introduction

Traditionally, we select a model and perform inference and prediction conditional on this model. Our results typically do not account for the uncertainty in selecting a model and thus may be overly optimistic. They may even be incorrect if our selected model is substantially different from the true data-generating model (DGM). In some applications, we may have strong theoretical or empirical evidence about the DGM. In other applications, usually of complex and unstable natures, such as those in economics, psychology, and epidemiology, choosing one reliable model can be difficult. Instead of relying on just one model, model averaging averages results over multiple plausible models based on the observed data. In BMA, the “plausibility” of the model is described by the posterior model probability (PMP), which is determined using the fundamental Bayesian principles—the Bayes theorem—and applied universally to all data analyses. BMA can be used to account for model uncertainty when estimating model parameters and predicting new observations to avoid overly optimistic conclusions. It is particularly useful in applications with several plausible models, where there is no one definitive reason to choose a particular model over the others. But even if choosing a single model is the end goal, you can find BMA beneficial. It provides a principled way to identify important models and predictors within the considered classes of models. Its framework allows you to learn about interrelations between different predictors in terms of their tendency to appear in a model together, separately, or independently. It can be used to evaluate the sensitivity of the final results to various assumptions about the importance of different models and predictors. And it provides optimal predictions in the log-score sense.

The bma suite

In a regression setting, model uncertainty amounts to the uncertainty of which predictors should be included in a model. We can use bmaregress to account for selection of predictors in a linear regression. It explores the model space either exhaustively with the enumeration option, whenever feasible, or by using the Markov chain Monte Carlo (MCMC) model composition (MC3) algorithm with the sampling option. It reports various summaries of the visited models and included predictors and of posterior distributions of model parameters. It allows you to specify groups of predictors to be included or excluded together from a model and those that are included in all models. It provides various prior distributions for models in the mprior() option and for the g parameter, which controls shrinkage of regression coefficients toward zero, in the gprior() option. It also supports factor variables and time-series operators and provides several ways for handling interactions during estimation by using the heredity() option.

There are many supported postestimation features, which also include some of the standard Bayesian postestimation features.

Command	Description
bmacoefsample	Posterior samples of regression coefficients

bmagraph pmp	model-probability plots
bmagraph msize	model-size distribution plots
bmagraph varmap	variable-inclusion maps
bmagraph coefdensity	coefficient posterior density plots

bmastats models	posterior model and variable-inclusion summaries
bmastats msize	model-size summary
bmastats pip	posterior inclusion probabilities (PIPs) for predictors
bmastats jointness	jointness measures for predictors
bmastats lps	log predictive-score (LPS)

bmapredict	BMA predictions

bayesgraph	Bayesian graphical summaries and convergence diagnostics
bayesstats summary	Bayesian summary statistics for model parameters and their functions
bayesstats ess	Bayesian effective sample sizes and related statistics
bayesstats ppvalues	Bayesian predictive p-values
bayespredict	Bayesian predictions

Let’s see it work

-> Model enumeration -> Credible intervals -> Influential models -> Important predictors ->Model-size distribution -> Posterior distributions of coefficients -> BMA prediction -> Informative model priors -> Predictive performance using LPS -> Cleanup

Below, we provide a tour of some bma features using a toy dataset. This dataset contains $n = 200$

� = 200

observations,

p = 10

� = 10

orthogonal predictors, and outcome y generated as

$y = 0.5 + 1.2 \times x 2 + 5 \times x 10 + ϵ$

� = 0.5 + 1.2 \times � 2 + 5 \times � 10 + �

where $ϵ \sim N (0, 1)$

� \sim � (0, 1)

is a standard normal error term.

. webuse bmaintro
(Simulated data for BMA example)

. summarize



Variable

Obs Mean Std. dev. Min Max





y

200 .9944997 4.925052 -13.332 13.06587


x1

200 -.0187403 .9908957 -3.217909 2.606215


x2

200 -.0159491 1.098724 -2.999594 2.566395


x3

200 .080607 1.007036 -3.016552 3.020441


x4

200 .0324701 1.004683 -2.410378 2.391406





x5

200 -.0821737 .9866885 -2.543018 2.133524


x6

200 .0232265 1.006167 -2.567606 3.840835


x7

200 -.1121034 .9450883 -3.213471 1.885638


x8

200 -.0668903 .9713769 -2.871328 2.808912


x9

200 -.1629013 .9550258 -2.647837 2.472586





x10

200 .083902 .8905923 -2.660675 2.275681

Model enumeration

We use bmaregress to fit a BMA linear regression of y on x1 through x10.

. bmaregress y x1-x10

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                            No. of obs         =    200
Linear regression                                   No. of predictors  =     10
Model enumeration                                               Groups =     10
                                                                Always =      0
Priors:                                             No. of models      =  1,024
  Models: Beta-binomial(1, 1)                           For CPMP >= .9 =      9
   Cons.: Noninformative                            Mean model size    =  2.479
   Coef.: Zellner's g
       g: Benchmark, g = 200                        Shrinkage, g/(1+g) = 0.9950
  sigma2: Noninformative                            Mean sigma2        =  1.272






y

Mean Std. dev. Group PIP





x2

1.198105 .0733478 2 1


x10

5.08343 .0900953 10 1


x3

-.0352493 .0773309 3 .21123


x9

.004321 .0265725 9 .051516


x1

.0033937 .0232163 1 .046909


x4

-.0020407 .0188504 4 .039267


x5

.0005972 .0152443 5 .033015


x9

-.0005639 .0153214 8 .032742


x7

-8.23e-06 .015497 7 .032386


x6

-.0003648 .0143983 6 .032361





Always




_cons

.5907923 .0804774 0 1

Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default priors are used for models and parameter g.

With 10 predictors and the default fixed (benchmark) g-prior, bmaregress uses model enumeration and explores all $2^{10} = 1, 024$

2^{10} = 1, 024

possible models. The default model prior is beta-binomial with shape parameters of 1, which is uniform over the model size. A priori, our BMA model assumed little shrinkage of regression coefficients toward zero because

g / (1 + g) = 0.9950

� / (1 + �) = 0.9950

is close to 1.

bmaregress identified x2 and x10 as highly important predictors—their posterior inclusion probabilities (PIPs) are essentially 1. The posterior mean estimates of their regression coefficients are, respectively, 1.2 (rounded) and 5.1 and are very close to the true values of 1.2 and 5. All other predictors have much lower PIPs and coefficient estimates close to zero. Our BMA findings are consistent with the true DGM.

Let’s store our current estimation results for later use. As with other Bayesian commands, we must save the simulation results first before we can use estimates store to store the estimation results.

. bmaregress, saving(bmareg)
note: file bmareg.dta saved.

. estimates store bmareg

Credible intervals

bmaregress does not report credible intervals (CrIs) by default, because it requires a potentially time-consuming simulation of the posterior sample of model parameters. But we can compute them after estimation.

We first use bmacoefsample to generate an MCMC sample from the posterior distribution of model parameters, which include regression coefficients. We then use the existing bayesstats summary command to compute posterior summaries of the generated MCMC sample.

. bmacoefsample, rseed(18)

Simulation (10000): ....5000....10000 done

. bayesstats summary

Posterior summary statistics                      MCMC sample size =    10,000








Equal-tailed




Mean Std. dev. MCSE Median [95% cred. interval]





y




x1

.0031927 .0234241 .000234 0 0 .0605767


x2

1.197746 .0726358 .000735 1.197211 1.053622 1.341076


x3

-.0361581 .0780037 .00078 0 -.2604694 0


x4

-.0021015 .018556 .000186 0 -.0306376 0


x5

.0004701 .0147757 .000148 0 0 0


x6

-.0003859 .0140439 .000142 0 0 0


x7

-.0003311 .0166303 .000166 0 0 0


x8

-.0005519 .0145717 .00015 0 0 0


x9

.0046535 .0273899 .000274 0 0 .096085


x10

5.08357 .0907759 .000927 5.083466 4.90354 5.262716


_cons

.5901334 .0811697 .000801 .5905113 .4302853 .7505722





sigma2

1.272579 .1300217 .0013 1.262612 1.043772 1.555978


g

200 0 0 200 200 200

The 95% CrI for the coefficient of x2 is [1.05, 1.34], and the one for x10 is [4.9, 5.26], which is consistent with our DGM.

Influential models

The BMA coefficient estimates are averaged over 1,024 models. It is important to investigate which models are influential. We can use bmastats models to explore the PMPs.

. bmastats models

Model summary         Number of models:
                               Visited = 1,024
                              Reported =     5

Analytical PMP Model size .6292 2 .1444 3 .0258 3 .0246 3 .01996 3


				Rank	1	2	3	4	5

Variable-inclusion summary


		Rank Rank Rank Rank Rank
		1 2 3 4 5

x2		x x x x x
x10		x x x x x
x3		x
x9		x
x1		x
x4		x

Legend: 
x - estimated

Not surprisingly, the model that contains both x2 and x10 has the highest PMP of about 63%. In fact, the top two models correspond to a cumulative PMP of about 77%:

. bmastats models, cumulative(0.75)

Computing model probabilities ...

Model summary           Number of models:
                                 Visited = 1,024
                                Reported =     2


		Analytical CPMP Model size

Rank
1		.6292 2
2		.7736 3

Variable-inclusion summary








Rank Rank




1 2





x2

x x


x10

x x


x3

x

Legend: 
x - estimated

We can use bmagraph pmp to plot the cumulative PMP.

. bmagraph pmp, cumulative

The command plots the first 100 models by default, but you can specify the top() option if you would like to see more models.

Important predictors

We can explore the importance of predictors visually by using bmagraph varmap to produce a variable-inclusion map.

. bmagraph varmap

Computing model probabilities ...

x2 and x10 are included in all top 100 models, ranked by PMP. Their coefficients are positive in all models.

Model-size distribution

We can explore the complexity of our BMA model by using bmastats msize and bmagraph msize to explore the prior and posterior model-size distributions.

. bmastats msize

Model-size summary

Number of models = 1,024
Model size:
  Minimum =  0
  Maximum = 10








Mean Median





Prior




Analytical

5.0000 5





Posterior




Analytical

2.4794 2

Note: Frequency summaries not available.

. bmagraph msize

The prior model-size distribution is uniform over the model size. The posterior model-size distribution is skewed to the left with the mode of about 2. The prior mean model size is 5 and the posterior one is 2.48. So, based on the observed data, BMA tends to favor smaller models with roughly two predictors, on average, compared with our a priori assumption.

Posterior distributions of coefficients

We can use bmagraph coefdensity to plot posterior distributions of regression coefficients.

. bmagraph coefdensity {x2} {x3}, combine

These distributions are mixtures of a point mass at zero corresponding to the predictor’s probability of noninclusion and a continuous density conditional on the inclusion. For the coefficient of x2, the probability of noninclusion is negligible, so its posterior distribution is essentially a continuous, fairly symmetric density centered at 1.2. For the coefficient of x3, in addition to the conditional continuous density, there is a red vertical line at zero corresponding to the posterior noninclusion probability of roughly 1-.2 = 0.8.

BMA prediction

bmapredict produces various BMA predictions. For instance, let’s compute the posterior predictive means.

. bmapredict pmean, mean
note: computing analytical posterior predictive means.

We can also compute predictive CrIs to estimate uncertainty about our predictions. (This is not available with many traditional model-selection methods.) Note that these predictive CrIs also incorporate model uncertainty. To compute predictive CrIs, we must save the posterior MCMC model-parameter sample first.

. bmacoefsample, saving(bmacoef)
note: saving existing MCMC simulation results without resampling; specify
      option simulate to force resampling in this case.
note: file bmacoef.dta saved.

. bmapredict cri_l cri_u, cri rseed(18)
note: computing credible intervals using simulation.

Computing predictions ...

We summarize the predicted results:

.  summarize y pmean cri*






Variable

Obs Mean Std. dev. Min Max





y

200 .9944997 4.925052 -13.332 13.06587


pmean

200 .9944997 4.783067 -13.37242 12.31697


cri_l

200 -1.24788 4.787499 -15.66658 10.03054


cri_u

200 3.227426 4.779761 -11.06823 14.58301

The reported averages over observations for predictive means and lower and upper 95% predictive CrI bounds look reasonable relative to the observed outcome y.

Informative model priors

One of the strengths of BMA is the ability to incorporate prior information about models and predictors. This allows us to investigate the sensitivity of the results to various assumptions about the importance of models and predictors. Suppose that we have reliable information that all predictors except x2 and x10 are less likely to be related to the outcome. We can specify, for example, a binomial model prior with low prior inclusion probabilities for these predictors.

To compare the out-of-sample performance of the BMA models later, we randomly split our sample into two parts and fit the BMA model using the first subsample. We also save the results from this more informative BMA model.

. splitsample, generate(sample) nsplit(2) rseed(18)

. bmaregress y x1-x10 if sample == 1, mprior(binomial x2 x10 0.5 x1 x3-x9 0.05)
  saving(bmareg_inf)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                          No. of obs         =    100
Linear regression                                 No. of predictors  =     10
Model enumeration                                             Groups =     10
                                                              Always =      0
Priors:                                           No. of models      =  1,024
  Models: Binomial, IP varies                         For CPMP >= .9 =      1
   Cons.: Noninformative                          Mean model size    =  2.072
   Coef.: Zellner's g
       g: Benchmark, g = 100                      Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                          Mean sigma2        =  1.268


y		Mean Std. dev. Group PIP

x2		1.168763 .1031096 2 1
x10		4.920726 .124615 10 1
x1		.0019244 .0204242 1 .013449
x5		-.0018262 .0210557 5 .011973
x3		-.0017381 .0205733 3 .011557
x4		-.0015444 .0193858 4 .010709

Always
_cons		.5637673 .113255 0 1

Note: Coefficient posterior means and std. dev. estimated from 1,024 models.
Note: Default prior is used for parameter g.
Note: 4 predictors with PIP less than .01 not shown.

file bmareg_inf.dta saved.

. estimates store bmareg_inf

The effect of this model prior is that the PIP of all unimportant predictors is now less than 2%.

Note that when we select one model, we are essentially fitting a BMA model with a very strong prior that all selected predictors must be included with a probability of 1. For instance, we can force bmaregress to include all variables in the model:

. bmaregress y (x1-x10, always) if sample == 1, saving(bmareg_all)

Enumerating models ...
Computing model probabilities ...

Bayesian model averaging                           No. of obs         =    100
Linear regression                                  No. of predictors  =     10
Model enumeration                                              Groups =      0
                                                               Always =     10
Priors:                                            No. of models      =      1
  Models: Beta-binomial(1, 1)                          For CPMP >= .9 =      1
   Cons.: Noninformative                           Mean model size    = 10.000
   Coef.: Zellner's g
       g: Benchmark, g = 100                       Shrinkage, g/(1+g) = 0.9901
  sigma2: Noninformative                           Mean sigma2        =  1.192


Mean Std. dev. Group PIP

.1294521 .105395 0 1

1.166679 .1129949 0 1

-.1433074 .1271903 0 1

-.1032189 .1223152 0 1

-.0819008 .1261309 0 1

.0696633 .1057512 0 1

.0222949 .1215404 0 1

-.0252311 .1124352 0 1

.0587412 .1166921 0 1

4.949992 .1276795 0 1

.6006153 .1127032 0 1






y

Always
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
_cons

Note: Coefficient posterior means and std. dev. estimated from 1 model.
Note: Default priors are used for models and parameter g.

file bmareg_all.dta saved.

. estimates store bmareg_all

Predictive performance using LPS

For comparison of the considered BMA models, we refit our default BMA model using the first subsample and store the estimation results.

. qui bmaregress y x1-x10 if sample == 1, saving(bmareg, replace)
. estimates store bmareg

We can compare the out-of-sample predictive performance of our BMA models by using bmastats lps to compute the log predictive-score (LPS) for out-of-sample observations.

. bmastats lps bmareg bmareg_inf bmareg_all if sample == 2, compact

Log predictive-score (LPS)

Number of observations = 100


LPS		Mean Minimum Maximum

bmareg		1.562967 1.039682 6.778834
bmareg_inf		1.562238 1.037576 6.883794
bmareg_all		1.576231 1.032793 6.084414

Notes: Using analytical PMPs.
       Result bmareg_inf has the smallest mean LPS.

The more informative bmareg_inf model has a slightly smaller mean LPS, but the LPS summaries for all models are very similar. See [BMA] bmastats lps for how to compare the BMA model performance by using cross-validation.

Cleanup

We generated several datasets during our analysis. We no longer need them, so we erase them at the end. But you may decide to keep them, especially if they correspond to a potentially time-consuming final analysis.

. erase bmareg.dta
. erase bmacoef.dta
. erase bmareg_inf.dta
. erase bmareg_all.dta

View all the new features

Order Stata 18

Upgrade