Applied Bayesian Modelling

Name: Applied Bayesian Modelling
Brand: Wiley
Price: 65.99 EUR
Availability: OnlineOnly

Peter Congdon(Author)

Wiley (Publisher)

2nd Edition

Published on 25. June 2014

464 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-89506-1 (ISBN)

€65.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Preface xi

1 Bayesian methods and Bayesian estimation 1

1.1 Introduction 1

1.1.1 Summarising existing knowledge: Prior densities for parameters 2

1.1.2 Updating information: Prior, likelihood and posterior densities 3

1.1.3 Predictions and assessment 5

1.1.4 Sampling parameters 6

1.2 MCMC techniques: The Metropolis-Hastings algorithm 7

1.2.1 Gibbs sampling 8

1.2.2 Other MCMC algorithms 9

1.2.3 INLA approximations 10

1.3 Software for MCMC: BUGS, JAGS and R-INLA 11

1.4 Monitoring MCMC chains and assessing convergence 19

1.4.1 Convergence diagnostics 20

1.4.2 Model identifiability 21

1.5 Model assessment 23

1.5.1 Sensitivity to priors 23

1.5.2 Model checks 24

1.5.3 Model choice 25

References 28

2 Hierarchical models for related units 34

2.1 Introduction: Smoothing to the hyper population 34

2.2 Approaches to model assessment: Penalised fit criteria, marginal likelihood and predictive methods 35

2.2.1 Penalised fit criteria 36

2.2.2 Formal model selection using marginal likelihoods 37

2.2.3 Estimating model probabilities or marginal likelihoods in practice 38

2.2.4 Approximating the posterior density 40

2.2.5 Model averaging from MCMC samples 42

2.2.6 Predictive criteria for model checking and selection: Cross-validation 46

2.2.7 Predictive checks and model choice using complete data replicate sampling 50

2.3 Ensemble estimates: Poisson-gamma and Beta-binomial hierarchical models 53

2.3.1 Hierarchical mixtures for poisson and binomial data 54

2.4 Hierarchical smoothing methods for continuous data 61

2.4.1 Priors on hyperparameters 62

2.4.2 Relaxing normality assumptions 63

2.4.3 Multivariate borrowing of strength 65

2.5 Discrete mixtures and dirichlet processes 69

2.5.1 Finite mixture models 69

2.5.2 Dirichlet process priors 72

2.6 General additive and histogram smoothing priors 78

2.6.1 Smoothness priors 79

2.6.2 Histogram smoothing 80

Exercises 83

Notes 86

References 89

3 Regression techniques 97

3.1 Introduction: Bayesian regression 97

3.2 Normal linear regression 98

3.2.1 Linear regression model checking 99

3.3 Simple generalized linear models: Binomial, binary and Poisson regression 102

3.3.1 Binary and binomial regression 102

3.3.2 Poisson regression 105

3.4 Augmented data regression 107

3.5 Predictor subset choice 110

3.5.1 The g-prior approach 114

3.5.2 Hierarchical lasso prior methods 116

3.6 Multinomial, nested and ordinal regression 126

3.6.1 Nested logit specification 128

3.6.2 Ordinal outcomes 130

Exercises 136

Notes 138

References 144

4 More advanced regression techniques 149

4.1 Introduction 149

4.2 Departures from linear model assumptions and robust alternatives 149

4.3 Regression for overdispersed discrete outcomes 154

4.3.1 Excess zeroes 157

4.4 Link selection 160

4.5 Discrete mixture regressions for regression and outlier status 161

4.5.1 Outlier accommodation 163

4.6 Modelling non-linear regression effects 167

4.6.1 Smoothness priors for non-linear regression 167

4.6.2 Spline regression and other basis functions 169

4.6.3 Priors on basis coefficients 171

4.7 Quantile regression 175

Exercises 177

Notes 177

References 179

5 Meta-analysis and multilevel models 183

5.1 Introduction 183

5.2 Meta-analysis: Bayesian evidence synthesis 184

5.2.1 Common forms of meta-analysis 185

5.2.2 Priors for stage 2 variation in meta-analysis 188

5.2.3 Multivariate meta-analysis 193

5.3 Multilevel models: Univariate continuous outcomes 195

5.4 Multilevel discrete responses 201

5.5 Modelling heteroscedasticity 204

5.6 Multilevel data on multivariate indices 206

Exercises 208

Notes 210

References 211

6 Models for time series 215

6.1 Introduction 215

6.2 Autoregressive and moving average models 216

6.2.1 Dependent errors 218

6.2.2 Bayesian priors in ARMA models 218

6.2.3 Further types of time dependence 222

6.3 Discrete outcomes 229

6.3.1 INAR models for counts 231

6.3.2 Evolution in conjugate process parameters 232

6.4 Dynamic linear and general linear models 235

6.4.1 Further forms of dynamic models 238

6.5 Stochastic variances and stochastic volatility 244

6.5.1 ARCH and GARCH models 244

6.5.2 State space stochastic volatility models 245

6.6 Modelling structural shifts 248

6.6.1 Level, trend and variance shifts 249

6.6.2 Latent state models including historic dependence 250

6.6.3 Switching regressions and autoregressions 251

Exercises 258

Notes 261

References 265

7 Analysis of panel data 273

7.1 Introduction 273

7.2 Hierarchical longitudinal models for metric data 274

7.2.1 Autoregressive errors 275

7.2.2 Dynamic linear models 276

7.2.3 Extended time dependence 276

7.3 Normal linear panel models and normal linear growth curves 278

7.3.1 Growth curves 280

7.3.2 Subject level autoregressive parameters 283

7.4 Longitudinal discrete data: Binary, categorical and Poisson panel data 285

7.4.1 Binary panel data 285

7.4.2 Ordinal panel data 288

7.4.3 Panel data for counts 292

7.5 Random effects selection 295

7.6 Missing data in longitudinal studies 297

Exercises 302

Notes 303

References 306

8 Models for spatial outcomes and geographical association 312

8.1 Introduction 312

8.2 Spatial regressions and simultaneous dependence 313

8.2.1 Regression with localised dependence 316

8.2.2 Binary outcomes 317

8.3 Conditional prior models 321

8.3.1 Ecological analysis involving count data 324

8.4 Spatial covariation and interpolation in continuous space 329

8.4.1 Discrete convolution processes 332

8.5 Spatial heterogeneity and spatially varying coefficient priors 337

8.5.1 Spatial expansion and geographically weighted regression 338

8.5.2 Spatially varying coefficients via multivariate priors 339

8.6 Spatio-temporal models 343

8.6.1 Conditional prior representations 345

8.7 Clustering in relation to known centres 348

8.7.1 Areas or cases as data 350

8.7.2 Multiple sources 350

Exercises 352

Notes 354

References 355

9 Latent variable and structural equation models 364

9.1 Introduction 364

9.2 Normal linear structural equation models 365

9.2.1 Cross-sectional normal SEMs 365

9.2.2 Identifiability constraints 367

9.3 Dynamic factor models, panel data factor models and spatial factor models 372

9.3.1 Dynamic factor models 372

9.3.2 Linear SEMs for panel data 374

9.3.3 Spatial factor models 378

9.4 Latent trait and latent class analysis for discrete outcomes 381

9.4.1 Latent trait models 381

9.4.2 Latent class models 382

9.5 Latent trait models for multilevel data 387

9.6 Structural equation models for missing data 389

Exercises 392

Notes 394

References 397

10 Survival and event history models 402

10.1 Introduction 402

10.2 Continuous time functions for survival 403

10.2.1 Parametric hazard models 405

10.2.2 Semi-parametric hazards 408

10.3 Accelerated hazards 411

10.4 Discrete time approximations 413

10.4.1 Discrete time hazards regression 415

10.5 Accounting for frailty in event history and survival models 417

10.6 Further applications of frailty models 421

10.7 Competing risks 423

Exercises 425

References 426

Index 431

Chapter 1
Bayesian methods and Bayesian estimation

1.1 Introduction

Bayesian analysis of data in the health, social and physical sciences has been greatly facilitated in the last two decades by improved scope for estimation via iterative sampling methods. Recent overviews are provided by Brooks et al. (2011), Hamelryck et al. (2012), and Damien et al. (2013). Since the first edition of this book in 2003, the major changes in Bayesian technology relevant to practical data analysis have arguably been in distinct new approaches to estimation, such as the INLA method, and in a much extended range of computer packages, especially in R, for applying Bayesian techniques (e.g. Martin and Quinn, 2006; Albert, 2007; Statisticat LLC, 2013).

Among the benefits of the Bayesian approach and of sampling methods of Bayesian estimation (Gelfand and Smith, 1990; Geyer, 2011) are a more natural interpretation of parameter uncertainty (e.g. through credible intervals) (Lu et al., 2012), and the ease with which the full parameter density (possibly skew or multi-modal) may be estimated. By contrast, frequentist estimates may rely on normality approximations based on large sample asymptotics (Bayarri and Berger, 2004). Unlike classical techniques, the Bayesian method allows model comparison across non-nested alternatives, and recent sampling estimation developments have facilitated new methods of model choice (e.g. Barbieri and Berger, 2004; Chib and Jeliazkov, 2005). The flexibility of Bayesian sampling estimation extends to derived ‘structural’ parameters combining model parameters and possibly data, and with substantive meaning in application areas, which under classical methods might require the delta technique. For example, Parent and Rivot (2012) refer to ‘management parameters’ derived from hierarchical ecological models.

New estimation methods also assist in the application of hierarchical models to represent latent process variables, which act to borrow strength in estimation across related units and outcomes (Wikle, 2003; Clark and Gelfand, 2006). Letting and denote joint and conditional densities respectively, the paradigm for a hierarchical model specifies

1.1

based on an assumption that observations are imperfect realisations of an underlying process and that units are exchangeable. Usually the observations are considered conditionally independent given the process and parameters.

Such techniques play a major role in applications such as spatial disease patterns, small domain estimation for survey outcomes (Ghosh and Rao, 1994), meta-analysis across several studies (Sutton and Abrams, 2001), educational and psychological testing (Sahu, 2002; Shiffrin et al., 2008) and performance comparisons (e.g. Racz and Sedransk, 2010; Ding et al., 2013).

The Markov chain Monte Carlo (MCMC) methodology may also be used to augment the data, providing an analogue to the classical EM method. Examples of such data augmentation (with a missing data interpretation) are latent continuous data underlying binary outcomes (Albert and Chib, 1993; Rouder and Lu, 2005) and latent multinomial group membership indicators that underlie parametric mixtures. MCMC mixing may also be improved by introducing auxiliary variables (Gilks and Roberts, 1996).

1.1.1 Summarising existing knowledge: Prior densities for parameters

In classical inference the sample data are taken as random while population parameters , of dimension , are taken as fixed. In Bayesian analysis, parameters themselves follow a probability distribution, knowledge about which (before considering the data at hand) is summarised in a prior distribution . In many situations it might be beneficial to include in this prior density cumulative evidence about a parameter from previous scientific studies. This might be obtained by a formal or informal meta-analysis of existing studies. A range of other methods exist to determine or elicit subjective priors (Garthwaite et al., 2005; Gill and Walker, 2005). For example, the histogram method divides the range of into a set of intervals (or ‘bins’) and uses the subjective probability of lying in each interval; from this set of probabilities, may be represented as a discrete prior or converted to a smooth density. Another technique uses prior estimates of moments, for instance in a normal density with prior estimates and of the mean and variance, or prior estimates of summary statistics (median, range) which can be converted to estimates of and (Hozo et al., 2005).

Often, a prior amounts to a form of modelling assumption or hypothesis about the nature of parameters, for example, in random effects models. Thus small area death rate models may include spatially correlated random effects, exchangeable random effects with no spatial pattern, or both. A prior specifying the errors as spatially correlated is likely to be a working model assumption rather than a true cumulation of knowledge.

In many situations, existing knowledge may be difficult to summarise or elicit in the form of an informative prior, and to reflect such essentially prior ignorance, resort is made to non-informative priors. Examples are flat priors (e.g. that a parameter is uniformly distributed between and ) and Jeffreys prior

where is the expected information1 matrix. It is possible that a prior is improper (does not integrate to 1 over its range). Such priors may add to identifiability problems (Gelfand and Sahu, 1999), especially in hierarchical models with random effects intermediate between hyperparameters and data. An alternative strategy is to adopt vague (minimally informative) priors which are ‘just proper’. This strategy is considered below in terms of possible prior densities to adopt for the variance or its inverse. An example for a parameter distributed over all real values might be a normal with mean zero and large variance. To adequately reflect prior ignorance while avoiding impropriety, Spiegelhalter et al. (1996) suggest a prior standard deviation at least an order of magnitude greater than the posterior standard deviation.

1.1.2 Updating information: Prior, likelihood and posterior densities

In classical approaches such as maximum likelihood, inference is based on the likelihood of the data alone. In Bayesian models, the likelihood of the observed data , given a set of parameters , denoted or equivalently , is used to modify the prior beliefs . Updated knowledge based on the observed data and the information contained in the prior densities is summarised in a posterior density, . The relationship between these densities follows from standard probability relations. Thus

and therefore the posterior density can be written

The denominator is a known as the marginal likelihood of the data, and found by integrating the likelihood over the joint prior density

This quantity plays a central role in formal approaches to Bayesian model choice, but for the present purpose can be seen as an unknown proportionality factor, so that

or equivalently

1.2

The product is sometimes called the un-normalised posterior density. From the Bayesian perspective, the likelihood is viewed as a function of given fixed data and so elements in the likelihood that are not functions of become part of the proportionality constant in (1.2). Similarly, for a hierarchical model as in (1.1), let denote latent variables depending on hyperparameters . Then one has

or equivalently

1.3

Equations (1.2) and (1.3) express mathematically the process whereby updated beliefs are a function of prior knowledge and the sample data evidence.

It is worth introducing at this point the notion of the full conditional density for individual parameters (or parameter blocks) , namely

where denotes the parameter set excluding . These densities are important in MCMC sampling, as discussed below. The full conditional density can be abstracted from the un-normalised posterior density by regarding all terms except those involving as constants.

For example, consider a normal density for observations with likelihood

Assume a gamma prior on , and a prior on . Then the joint posterior density, concatenating constant terms (including the inverse of the marginal likelihood) into the constant , is

1.4

The full conditional density for is expressed analytically as

and can be obtained from (1.4) by focusing only on terms that are functions of . Thus

By algebraic re-expression, and with , one may show

Similarly

which can be re-expressed as

where denotes a gamma density with mean and variance .

1.1.3 Predictions and assessment

The principle of updating extends to replicate values or predictions. Before the study a prediction would be based on random draws from the prior density of parameters and is likely to have little precision. Part of the goal of a new study is to use the data as a basis for making improved predictions or evaluation of future options. Thus in a meta-analysis of mortality odds ratios (e.g. for a new as against conventional therapy), it may be useful to assess the likely odds ratio in a hypothetical future study on the basis of findings from existing studies. Such a prediction is based on the likelihood of averaged over the posterior density based on :

where the likelihood of , , usually takes the same...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Applied Bayesian Modelling

Description

Reviews / Votes

More details

Other editions

Additional editions

Person

Content

Chapter 1
Bayesian methods and Bayesian estimation

1.1 Introduction

1.1.1 Summarising existing knowledge: Prior densities for parameters

1.1.2 Updating information: Prior, likelihood and posterior densities

1.1.3 Predictions and assessment

System requirements

Schweitzer Fachinformationen

Applied Bayesian Modelling

Description

Reviews / Votes

More details

Other editions

Additional editions

Person

Content

Chapter 1 Bayesian methods and Bayesian estimation

1.1 Introduction

1.1.1 Summarising existing knowledge: Prior densities for parameters

1.1.2 Updating information: Prior, likelihood and posterior densities

1.1.3 Predictions and assessment

System requirements

Chapter 1
Bayesian methods and Bayesian estimation