Scalable Bayesian Variable Selection Regression Models for Count PDF

29 janv. 2008 Keywords: Bayesian variable selection; spike and slab priors; independence prior; ... 1.2 The Bayesian normal linear regression model .

Bayesian Variable Selection for Random Intercept Modeling of

and Lesaffre (2008) suggested to use finite mixture of normal priors for p(?i

Bayesian Variable Selection in Linear Regression

This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of a dependent variable.

Variable Selection for Regression Models

Bayesian inference F-tests

Bayesian Variable Selection in Linear Regression - TJ Mitchell; JJ

16 sept. 2005 This article is concerned with the selection of subsets of predictor variables in a linear regression model for the prediction of.

Decoupling shrinkage and selection in Bayesian linear models: a

A posterior variable selection summary is proposed which distills a full posterior distribution over regression coefficients into a sequence of sparse linear

BAYESIAN VARIABLE SELECTION IN LINEAR REGRESSION AND

The aim is to get the model with the smallest risk. On the other hand Yard?mc? [17] claims that the rates of risk and posterior probability should be evaluated

APPROACHES FOR BAYESIAN VARIABLE SELECTION

formulations of variable selection uncertainty in normal linear regress In the context of building a multiple regression model we consider the f.

Scalable Bayesian Variable Selection Regression Models for Count

In this chapter we focus on Bayesian vari- able selection regression models for count data

Bayesian Variable Selection Under Collinearity

In the Bayesian approach to variable selection in linear regression all models are embedded in a hierarchical mixture model

ii "book" - 2019/5/4 - 19:32 - page 1 - #1ii i ii iCHAPTER 1

Scalable Bayesian Variable

Selection Regression Models for

Count Data

Yinsen Miao

,, Jeong Hwan Kook, Yadong Lu, Michele Guindaniand

Marina Vannucci

Rice University, Department of Statistics, 6100 Main St, Houston, TX 77005 University of California, Irvine, Department of Statistics, Brent Hall 2241, Irvine, CA 92697

Abstract

Variable selection, also known as feature selection in the machine learning literature, plays an indispensable role in scientific studies. In many research areas with massive data, finding a subset of representative features that best explain the outcome of interest has become a critical component in any researcher"s workflow. In this chapter, we focus on Bayesian vari- able selection regression models for count data, and specifically on the negative binomial linear regression model and on the Dirichlet-multinomial regression model. We address the variable selection problem viaspike-and-slabpriors. For posterior inference, we review stan- dard MCMC methods and also investigate computationally more efficient variational infer- ence approaches that use data augmentation techniques and concrete relaxation methods. We investigate performance of the methods via simulation studies and benchmark datasets. Keywords:Bayesian Variable Selection, Count Data, Data Augmentation, Dirichlet-Multinomial Regression, Negative Binomial Regression, Spike-and-Slab Priors,

Tensorflow, Variational Inference

Chapter points

We consider linear regression models for count data, specifically negative Bino- mial regression models and Dirichlet-multinomial regression models. We address variable selection via the use ofspike-and-slabpriors on the regression coefficients. We develop efficient variational methods for scalability in the number of covariates that are based on augmentation techniques and concrete relaxation methods. We provide C/C++ code athttps://github.com/marinavannucci/snbvbs, for the negative binomial case, and Python code athttps://github.com/mguindanigroup/ vbmultdir, for the Dirichlet-multinomial case. c

Elsevier Ltd.

All rights reserved.1

ii "book" - 2019/5/4 - 19:32 - page 2 - #2ii i ii i2Book Title

1. Introduction

Variable selection, also known as feature selection in the machine learning literature, plays an indispensable role in scientific studies: in cancer research, biomedical scien- tists seek to find connections between cancer phenotypes and a parsimonious set of genes; in finance, economists look for a small portfolio that can accurately track the performance of stock market indices such as the S&P 500. In many research areas with massive data, finding a subset of representative features that best explain the outcome of interest has become a critical component in any researcher"s workflow. As evidenced by numerous research papers published in either theory or practice, variable selection for linear regression models has been an important topic in the sta- tistical literature for the past several decades. Variable selection methods can be cat- egorized into roughly three groups: criteria-based methods including traditional ap- proaches such as AIC/BIC [6,43 ], penalized regression methods [47,12 ,14 ,58 ] and

Bayesian approaches [

30
16 5 ]. In this chapter, we focus primarily on Bayesian ap- proaches for variable selection that usespike-and-slabpriors. An obvious advantage when using these priors is that, in addition to the sparse estimation of the regression coecients, these methods produce posterior probabilities of inclusion (PPIs) for each covariate. Moreover, Bayesian approaches have the advantages of being able to aggre- gate multiple sub-models from a class of possible ones, based on their corresponding posterior probabilities. This approach is known as Bayesian model averaging (BMA) and can lead to improved prediction accuracy over single models [ 18 Despite the great features oered byspike-and-slabpriors, computational issues remain a challenge. The posterior distribution for a candidate model usually does not have a closed-form expression, and its inference may be computationally intractable even for a moderate number of predictors. To address the problem, approximate meth- ods that use Markov Chain Monte Carlo (MCMC) stochastic searches have been ex- tensively used [ 16 5 ]. Recently, variational inference (VI) methods [ 7 20 34
53
41
have attracted attention as a faster and more scalable alternative. These methods have also been used for model selection in dierent applied modeling contexts, particularly in bioinformatics [ 19 ] and neuroimaging [ 32
54
In this chapter, we focus primarily on regression models for count data, and specif- ically on negative binomial linear regression models and on Dirichlet-multinomial re- gression models. In both settings, we formulate a Bayesian hierarchical model with variable selection usingspike-and-slabpriors. For posterior inference, we review standard MCMC methods and also investigate computationally more ecient varia- tional inference approaches that use data augmentation techniques and concrete relax- ation methods. We investigate performance of the methods via simulation studies and benchmark datasets. ii "book" - 2019/5/4 - 19:32 - page 3 - #3ii i ii iScalable Bayesian Variable Selection for Negative Binomial Regression Models3

2. Bayesian Variable Selection via Spike-and-Slab Priors

In ordinary linear regression, a responseyiis modeled as y i=0+xTi+i; iNormal(0;2);(1.1) fori=1;:::;n, withxi2Rpavectorofpknowncovariates,=h

1;:::;pi

Tavector

of regression coecients and0the baseline or intercept. A Bayesian approach to variable selection in linear regression models formulates the selection problem via hierarchical priors on the unknown coecientsk,k=1;:::;p. In this chapter we examine one of the most widely used sparsity-inducing priors, known as thespike- and-slabprior [30]. This prior can be written as kj k kNormal0;2+(1 k)0;k=1;:::;p;(1.2) with ka latent indicator variable of whether thek-th covariate has a nonzero ef- fect on the outcome,0a point mass distribution at 0, and2the variance of the prior eect size. Typically, independent Bernoulli priors are imposed on the k"s, i.e. kBernoulli(). For reviews on the general topic of Bayesian variable selection for regression models with continuous responses we refer interested readers to [ 33
13 Alternatively, shrinkage priors, that do not impose a spike at zero, can be considered, such as the normal-gamma [ 17 ], the horseshoe [ 36
], and the LASSO [ 35
] priors. Recently, non-local prior densities have been used in Bayesian hypothesis testing and variable selection, as an attempt to balance the rates of convergence of Bayes factors under the null and alternative hypotheses [ 23
]. The large sample properties of Bayes factors obtained by local alternative priors imply that, as the sample size increases, evidence accumulates much more rapidly in favor of true alternative models than the true null models. Suppose the null hypothesisH0is20and the alternative hypothesisH1is21. Here, we define a non-local density ifp(jH1)=0 for all

20andp(jH1)>0 for all21. In the variable selection settings considered

in this chapter, the hypotheses relate to the significance of the coecients, i.e.H0: =0 versusH1:,0. Therefore, a non-local selection prior is defined as a mixture of a point mass at zero and a continuous non-local alternative distribution, kj k kp k;2+(1 k)0;k=1;:::;p;(1.3) wherep k;2is a non-local density characterizing the prior distribution ofkunder the alternative hypothesis. Similarly as in the traditionalspike-and-slabprior formula- tion, a non-local selection prior models the sparsity explicitly by assigning a positive mass at the origin. However, unlike a flat Gaussian distribution, the densityp k;2 does not place a significant amount of probability mass near the null value zero, thus properly reflecting the prior belief that the parameter is away from zero underH1. In ii "book" - 2019/5/4 - 19:32 - page 4 - #4ii i ii i4Book Title this chapter, we use the product second moment (pMOM) prior [ 23
44
] and assume that thek"s are independent of each other and are drawn from p(;2)=p Y k=1 2k

2Normal0;2:(1.4)

3. Negative Binomial Regression Models

Fori=1;:::;n, let nowyiindicate observed counts on an outcome variable. Count data can be modeled via a negative binomial distribution, obtaining the regression model y ijr; iNB r;exp( i)1+exp( i)! ;(1.5) with i=0+xTiand withrthe overdispersion parameter. Given the law of total expectation and variance, the expectation and variance ofyican be calculated as E yijxi=expxTi+0+logr; Var yijxi=Eyijxi+1r

E2yijxi;(1.6)

showing that Var yijxi>Eyijxiand thus that the negative binomial model can account for overdispersion. Later on we will introduce auxiliary variables to facil- itate the use of data augmentation techniques that allow conjugate inference on the parametersandr. We write the prior model as follows: kj k kNormal0;2+(1 k)0; kBernoulli();

0Normal0;2

0;(1.7)

rGamma(ar;br);

2Scaled-Inv-2

0;20: Typically, a flat normal prior is imposed on the intercept term0, since there is usu- ally no reason to shrink it towards zero. Parameters2andcontrol the sparsity of the model. Performance of variable selection can be sensitive to these parameter set- tings. Two popular prior choices forare the beta distributionBeta(a;b)and the uniform distribution on the log scale log ()Uniform(min;max)[57]. When is marginalized, the obtained prior distributions on are a beta binomial distribu- tion and a truncated beta distribution, respectively. We impose a convenient heavy-tail conjugate prior called scaled inverse chi-square distribution on the slab variance pa- ii "book" - 2019/5/4 - 19:32 - page 5 - #5ii i ii iScalable Bayesian Variable Selection for Negative Binomial Regression Models5 rameter2where0is the degree of freedom for the scale parameter20. For stability purpose, it is recommended to use a large0for sparse models [7]. For posterior inference, with variable selection as the main focus, we are interested in recovering a small subset of covariates with significant association to the outcome. In the proposed Bayesian model, the relative importance of thek-th covariate can be assessed by computing its marginal posterior probability of inclusion (PPI) as PPI (k)p( k=1jy;X)=P kp k; k=1jy;XP kp( jy;X);(1.8) which involves a sum over 2 ppossible models marginalized over the other model parameters. Classical MCMC algorithms can be used to compute this analytically intractable term. Approaches that use data augmentation schemes have proven partic- ularly ecient.

3.1. Data Augmentation

Here we employ the P

´olya-Gamma augmentation approach of Polson et al. [37] to sampleandanadditionaldataaugmentationschemetoobtainaclosed-form, tractable update rule for the overdispersion parameterr, which we adapt from Zhou et al. [56]. A random variable!following a P´olya-Gamma distribution with parametersb2 R +,c2Ris defined as

D=1221

X k=1g k( k1=2)2+c2=42;(1.9) where thegkGamma(b;1)are independent gamma random variables andD=indi- cates equality in distribution. The main result from Polson et al. [ 37
] is that given a random variable!with density!PG(b;0),b2R+the following integral identity holds for alla2R: exp ( )a

1+exp( )b=2bexp( )E!hexp! 2=2i;(1.10)

where=ab=2. Additionally, the conditional distributionp(!j ), arising from treating the above integrand as the unnormalized joint density of (!; ), is p (!j )=exp 2!=2E !exp 2!=2p(!jb;0);(1.11) which is also in the P ´olya-Gamma class, i.e.,!j PG(b; ). For more details re- garding the derivation of the result, we refer interested readers to Polson et al. [ 37

Comparing Equation (

1.10 ) with the negative binomial regression likelihood given in ii "book" - 2019/5/4 - 19:32 - page 6 - #6ii i ii i6Book Title

Equation (

quotesdbs_dbs26.pdfusesText_32

[PDF] BAYEUX INTERCOM LISTE DES SECTEURS

[PDF] Bayeux. L`hôtel Villa Lara parmi les meilleurs hôtels du monde

[PDF] bayh-dole: déjà 30 ans

[PDF] BAYLE JEAN PAUL OPHTALMOLOGISTE 6 PLACE DU DRAPEAU - France

[PDF] Bayle, un style résolument accessible - France

[PDF] BAYLINER 2355 - Anciens Et Réunions

[PDF] Bayliner 2455 Ciera Prix : 29 900,00 - Anciens Et Réunions

[PDF] Bayliner-642-Cuddy-rot

[PDF] BAYMEC® 1 % Injectable ovins Composition - Alliance

[PDF] Baymer® Spray AL 779 - Creation

[PDF] bayonne - Anciens Et Réunions

[PDF] Bayonne : bien dormir, c`est essentiel - France

[PDF] bayonne médiation conciliateur de justice médiateur de la ville ordre - Anciens Et Réunions

[PDF] Bayonne, du coup d`Etat du 2 décembre 1851 au

[PDF] BAYONNE-PAU Carnet d`adresses / Address Book - Anciens Et Réunions

[PDF] Scalable Bayesian Variable Selection Regression Models for Count

Bayesian Variable Selection in Normal Regression Models