Foundations of Linear and Generalized Linear Models

Name: Foundations of Linear and Generalized Linear Models
Brand: Wiley
Price: 118.99 EUR
Availability: OnlineOnly

Alan Agresti(Author)

Wiley (Publisher)

Published on 15. January 2015

472 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-73005-8 (ISBN)

€118.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

A valuable overview of the most important ideas and results in statistical modeling Written by a highly-experienced author, Foundations of Linear and Generalized Linear Models is a clear and comprehensive guide to the key concepts and results of linearstatistical models. The book presents a broad, in-depth overview of the most commonly usedstatistical models by discussing the theory underlying the models, R software applications,and examples with crafted models to elucidate key ideas and promote practical modelbuilding. The book begins by illustrating the fundamentals of linear models, such as how the model-fitting projects the data onto a model vector subspace and how orthogonal decompositions of the data yield information about the effects of explanatory variables. Subsequently, the book covers the most popular generalized linear models, which include binomial and multinomial logistic regression for categorical data, and Poisson and negative binomial loglinear models for count data. Focusing on the theoretical underpinnings of these models, Foundations ofLinear and Generalized Linear Models also features: * An introduction to quasi-likelihood methods that require weaker distributional assumptions, such as generalized estimating equation methods * An overview of linear mixed models and generalized linear mixed models with random effects for clustered correlated data, Bayesian modeling, and extensions to handle problematic cases such as high dimensional problems * Numerous examples that use R software for all text data analyses * More than 400 exercises for readers to practice and extend the theory, methods, and data analysis * A supplementary website with datasets for the examples and exercises An invaluable textbook for upper-undergraduate and graduate-level students in statistics and biostatistics courses, Foundations of Linear and Generalized Linear Models is also an excellent reference for practicing statisticians and biostatisticians, as well as anyone who is interested in learning about the most important statistical models for analyzing data.

Reviews / Votes

"The book arose from a one-semester graduate level course taught by Alan Agresti at Harvard University. It has a clear didactic focus, which benefits greatly from Agresti's well-known clear writing style. Each of the 11 chapters is followed by around 40 exercises, which are diverse and interesting." "...I am very happy with the foundational perspective of this book. I think that students who master this material will have a very thorough understanding of the most important aspects of GLMs, which is more valuable than a kaleidoscopic knowledge. This is certainly one of the books I will consider when next I need to teach a course in generalized linear models." "...this is a great introduction to GLMs written in a clear and didactic style, and with a thoughtful choice and presentation of the material. Highly recommended." --Biometrics Journal, 2016 "This book is an essential reference for anyone working with or teaching GLMs." (Mathematical Association of America, 2016)

More details

Other editions

Person

Content

Preface xi

1 Introduction to Linear and Generalized Linear Models 1

1.1 Components of a Generalized Linear Model 2

1.2 Quantitative/Qualitative Explanatory Variables and Interpreting Effects 6

1.3 Model Matrices and Model Vector Spaces 10

1.4 Identifiability and Estimability 13

1.5 Example: Using Software to Fit a GLM 15

Chapter Notes 20

Exercises 21

2 Linear Models: Least Squares Theory 26

2.1 Least Squares Model Fitting 27

2.2 Projections of Data Onto Model Spaces 33

2.3 Linear Model Examples: Projections and SS Decompositions 41

2.4 Summarizing Variability in a Linear Model 49

2.5 Residuals Leverage and Influence 56

2.6 Example: Summarizing the Fit of a Linear Model 62

2.7 Optimality of Least Squares and Generalized Least Squares 67

Chapter Notes 71

Exercises 71

3 Normal Linear Models: Statistical Inference 80

3.1 Distribution Theory for Normal Variates 81

3.2 Significance Tests for Normal Linear Models 86

3.3 Confidence Intervals and Prediction Intervals for Normal Linear Models 95

3.4 Example: Normal Linear Model Inference 99

3.5 Multiple Comparisons: Bonferroni Tukey and FDR Methods 107

Chapter Notes 111

Exercises 112

4 Generalized Linear Models: Model Fitting and Inference 120

4.1 Exponential Dispersion Family Distributions for a GLM 120

4.2 Likelihood and Asymptotic Distributions for GLMs 123

4.3 Likelihood-Ratio/Wald/Score Methods of Inference for GLM Parameters 128

4.4 Deviance of a GLM Model Comparison and Model Checking 132

4.5 Fitting Generalized Linear Models 138

4.6 Selecting Explanatory Variables for a GLM 143

4.7 Example: Building a GLM 149

Appendix: GLM Analogs of Orthogonality Results for Linear Models 156

Chapter Notes 158

Exercises 159

5 Models for Binary Data 165

5.1 Link Functions for Binary Data 165

5.2 Logistic Regression: Properties and Interpretations 168

5.3 Inference About Parameters of Logistic Regression Models 172

5.4 Logistic Regression Model Fitting 176

5.5 Deviance and Goodness of Fit for Binary GLMs 179

5.6 Probit and Complementary Log-Log Models 183

5.7 Examples: Binary Data Modeling 186

Chapter Notes 193

Exercises 194

6 Multinomial Response Models 202

6.1 Nominal Responses: Baseline-Category Logit Models 203

6.2 Ordinal Responses: Cumulative Logit and Probit Models 209

6.3 Examples: Nominal and Ordinal Responses 216

Chapter Notes 223

Exercises 223

7 Models for Count Data 228

7.1 Poisson GLMs for Counts and Rates 229

7.2 Poisson/Multinomial Models for Contingency Tables 235

7.3 Negative Binomial GLMS 247

7.4 Models for Zero-Inflated Data 250

7.5 Example: Modeling Count Data 254

Chapter Notes 259

Exercises 260

8 Quasi-Likelihood Methods 268

8.1 Variance Inflation for Overdispersed Poisson and Binomial GLMs 269

8.2 Beta-Binomial Models and Quasi-Likelihood Alternatives 272

8.3 Quasi-Likelihood and Model Misspecification 278

Chapter Notes 282

Exercises 282

9 Modeling Correlated Responses 286

9.1 Marginal Models and Models with Random Effects 287

9.2 Normal Linear Mixed Models 294

9.3 Fitting and Prediction for Normal Linear Mixed Models 302

9.4 Binomial and Poisson GLMMs 307

9.5 GLMM Fitting Inference and Prediction 311

9.6 Marginal Modeling and Generalized Estimating Equations 314

9.7 Example: Modeling Correlated Survey Responses 319

Chapter Notes 322

Exercises 324

10 Bayesian Linear and Generalized Linear Modeling 333

10.1 The Bayesian Approach to Statistical Inference 333

10.2 Bayesian Linear Models 340

10.3 Bayesian Generalized Linear Models 347

10.4 Empirical Bayes and Hierarchical Bayes Modeling 351

Chapter Notes 357

Exercises 359

11 Extensions of Generalized Linear Models 364

11.1 Robust Regression and Regularization Methods for Fitting Models 365

11.2 Modeling With Large p 375

11.3 Smoothing Generalized Additive Models and Other GLM Extensions 378

Chapter Notes 386

Exercises 388

Appendix A Supplemental Data Analysis Exercises 391

Appendix B Solution Outlines for Selected Exercises 396

References 410

Author Index 427

Example Index 433

Subject Index 435

CHAPTER 1
Introduction to Linear and Generalized Linear Models

This is a book about linear models and generalized linear models. As the names suggest, the linear model is a special case of the generalized linear model. In this first chapter, we define generalized linear models, and in doing so we also introduce the linear model.

Chapters 2 and 3 focus on the linear model. Chapter 2 introduces the least squares method for fitting the model, and Chapter 3 presents statistical inference under the assumption of a normal distribution for the response variable. Chapter 4 presents analogous model-fitting and inferential results for the generalized linear model. This generalization enables us to model non-normal responses, such as categorical data and count data.

The remainder of the book presents the most important generalized linear models. Chapter 5 focuses on models that assume a binomial distribution for the response variable. These apply to binary data, such as "success" and "failure" for possible outcomes in a medical trial or "favor" and "oppose" for possible responses in a sample survey. Chapter 6 extends the models to multicategory responses, assuming a multinomial distribution. Chapter 7 introduces models that assume a Poisson or negative binomial distribution for the response variable. These apply to count data, such as observations in a health survey on the number of respondent visits in the past year to a doctor. Chapter 8 presents ways of weakening distributional assumptions in generalized linear models, introducing quasi-likelihood methods that merely focus on the mean and variance of the response distribution. Chapters 1-8 assume independent observations. Chapter 9 generalizes the models further to permit correlated observations, such as in handling multivariate responses. Chapters 1-9 use the traditional frequentist approach to statistical inference, assuming probability distributions for the response variables but treating model parameters as fixed, unknown values. Chapter 10 presents the Bayesian approach for linear models and generalized linear models, which treats the model parameters as random variables having their own distributions. The final chapter introduces extensions of the models that handle more complex situations, such as high-dimensional settings in which models have enormous numbers of parameters.

1.1 COMPONENTS OF A GENERALIZED LINEAR MODEL

The ordinary linear regression model uses linearity to describe the relationship between the mean of the response variable and a set of explanatory variables, with inference assuming that the response distribution is normal. Generalized linear models (GLMs) extend standard linear regression models to encompass non-normal response distributions and possibly nonlinear functions of the mean. They have three components.

Random component: This specifies the response variable y and its probability distribution. The observations1 on that distribution are treated as independent.
Linear predictor: For a parameter vector and a n × p model matrix X that contains values of p explanatory variables for the n observations, the linear predictor is Xß.
Link function: This is a function g applied to each component of that relates it to the linear predictor,

Next we present more detail about each component of a GLM.

1.1.1 Random Component of a GLM

The random component of a GLM consists of a response variable y with independent observations (y1, ., yn) having probability density or mass function for a distribution in the exponential family. In Chapter 4 we review this family of distributions, which has several appealing properties. For example, Siyi is a sufficient statistic for its parameter, and regularity conditions (such as differentiation passing under an integral sign) are satisfied for derivations of properties such as optimal large-sample performance of maximum likelihood (ML) estimators.

By restricting GLMs to exponential family distributions, we obtain general expressions for the model likelihood equations, the asymptotic distributions of estimators for model parameters, and an algorithm for fitting the models. For now, it suffices to say that the distributions most commonly used in Statistics, such as the normal, binomial, and Poisson, are exponential family distributions.

1.1.2 Linear Predictor of a GLM

For observation i, i = 1, ., n, let xij denote the value of explanatory variable xj, j = 1, ., p. Let xi = (xi1, ., xip). Usually, we set xi1 = 1 or let the first variable have index 0 with xi0 = 1, so it serves as the coefficient of an intercept term in the model. The linear predictor of a GLM relates parameters {?i} pertaining to {E(yi)} to the explanatory variables x1, ., xp using a linear combination of them,

The labeling of Spj = 1ßjxij as a linear predictor reflects that this expression is linear in the parameters. The explanatory variables themselves can be nonlinear functions of underlying variables, such as an interaction term (e.g., xi3 = xi1xi2) or a quadratic term (e.g., xi2 = x2i1).

In matrix form, we express the linear predictor as

where , ß is the p × 1 column vector of model parameters, and is the n × p matrix of explanatory variable values {xij}. The matrix is called the model matrix. In experimental studies, it is also often called the design matrix. It has n rows, one for each observation, and p columns, one for each parameter in ß. In practice, usually p = n, the goal of model parsimony being to summarize the data using a considerably smaller number of parameters.

GLMs treat yi as random and xi as fixed. Because of this, the linear predictor is sometimes called the systematic component. In practice xi is itself often random, such as in sample surveys and other observational studies. In this book, we condition on its observed values in conducting statistical inference about effects of the explanatory variables.

1.1.3 Link Function of a GLM

The third component of a GLM, the link function, connects the random component with the linear predictor. Let µi = E(yi), i = 1, ., n. The GLM links ?i to µi by ?i = g(µi), where the link function g( · ) is a monotonic, differentiable function. Thus, g links µi to explanatory variables through the formula:

(1.1)

In the exponential family representation of a distribution, a certain parameter serves as its natural parameter. This parameter is the mean for a normal distribution, the log of the odds for a binomial distribution, and the log of the mean for a Poisson distribution. The link function g that transforms µi to the natural parameter is called the canonical link. This link function, which equates the natural parameter with the linear predictor, generates the most commonly used GLMs. Certain simplifications result when the GLM uses the canonical link function. For example, the model has a concave log-likelihood function and simple sufficient statistics and likelihood equations.

1.1.4 A GLM with Identity Link Function is a "Linear Model"

The link function g(µi) = µi is called the identity link function. It has ?i = µi. A GLM that uses the identity link function is called a linear model. It equates the linear predictor to the mean itself. This GLM has

The standard version of this, which we refer to as the ordinary linear model, assumes that the observations have constant variance, called homoscedasticity. An alternative way to express the ordinary linear model is

where the "error term" ei has E(ei) = 0 and var(ei) = s2, i = 1, ., n. This is natural for the identity link and normal responses but not for most GLMs.

In summary, ordinary linear models equate the linear predictor directly to the mean of a response variable y and assume constant variance for that response. The normal linear model also assumes normality. By contrast, a GLM is an extension that equates the linear predictor to a link-function-transformed mean of y, and assumes a distribution for y that need not be normal but is in the exponential family. We next illustrate the three components of a GLM by introducing three of the most important GLMs.

1.1.5 GLMs for Normal, Binomial, and Poisson Responses

The class of GLMs includes models for continuous response variables....

Content (EPUB)

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Foundations of Linear and Generalized Linear Models

Description

Reviews / Votes

More details

Other editions

Additional editions

Person

Content

CHAPTER 1
Introduction to Linear and Generalized Linear Models

1.1 COMPONENTS OF A GENERALIZED LINEAR MODEL

1.1.1 Random Component of a GLM

1.1.2 Linear Predictor of a GLM

1.1.3 Link Function of a GLM

1.1.4 A GLM with Identity Link Function is a "Linear Model"

1.1.5 GLMs for Normal, Binomial, and Poisson Responses

System requirements

Schweitzer Fachinformationen

Foundations of Linear and Generalized Linear Models

Description

Reviews / Votes

More details

Other editions

Additional editions

Person

Content

CHAPTER 1 Introduction to Linear and Generalized Linear Models

1.1 COMPONENTS OF A GENERALIZED LINEAR MODEL

1.1.1 Random Component of a GLM

1.1.2 Linear Predictor of a GLM

1.1.3 Link Function of a GLM

1.1.4 A GLM with Identity Link Function is a "Linear Model"

1.1.5 GLMs for Normal, Binomial, and Poisson Responses

System requirements

CHAPTER 1
Introduction to Linear and Generalized Linear Models