
Foundations of Linear and Generalized Linear Models
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Reviews / Votes
"The book arose from a one-semester graduate level course taught by Alan Agresti at Harvard University. It has a clear didactic focus, which benefits greatly from Agresti's well-known clear writing style. Each of the 11 chapters is followed by around 40 exercises, which are diverse and interesting." "...I am very happy with the foundational perspective of this book. I think that students who master this material will have a very thorough understanding of the most important aspects of GLMs, which is more valuable than a kaleidoscopic knowledge. This is certainly one of the books I will consider when next I need to teach a course in generalized linear models." "...this is a great introduction to GLMs written in a clear and didactic style, and with a thoughtful choice and presentation of the material. Highly recommended." --Biometrics Journal, 2016 "This book is an essential reference for anyone working with or teaching GLMs." (Mathematical Association of America, 2016)More details
Other editions
Additional editions

Person
ALAN AGRESTI, PhD, is Distinguished Professor Emeritus in the Department of Statistics at the University of Florida. He has presented short courses on generalized linear models and categorical data methods in more than 30 countries. The author of over 200 journal articles, Dr. Agresti is also the author of Categorical Data Analysis, Third Edition, Analysis of Ordinal Categorical Data, Second Edition, and An Introduction to Categorical Data Analysis, Second Edition, all published by Wiley.
Content
Preface xi
1 Introduction to Linear and Generalized Linear Models 1
1.1 Components of a Generalized Linear Model 2
1.2 Quantitative/Qualitative Explanatory Variables and Interpreting Effects 6
1.3 Model Matrices and Model Vector Spaces 10
1.4 Identifiability and Estimability 13
1.5 Example: Using Software to Fit a GLM 15
Chapter Notes 20
Exercises 21
2 Linear Models: Least Squares Theory 26
2.1 Least Squares Model Fitting 27
2.2 Projections of Data Onto Model Spaces 33
2.3 Linear Model Examples: Projections and SS Decompositions 41
2.4 Summarizing Variability in a Linear Model 49
2.5 Residuals Leverage and Influence 56
2.6 Example: Summarizing the Fit of a Linear Model 62
2.7 Optimality of Least Squares and Generalized Least Squares 67
Chapter Notes 71
Exercises 71
3 Normal Linear Models: Statistical Inference 80
3.1 Distribution Theory for Normal Variates 81
3.2 Significance Tests for Normal Linear Models 86
3.3 Confidence Intervals and Prediction Intervals for Normal Linear Models 95
3.4 Example: Normal Linear Model Inference 99
3.5 Multiple Comparisons: Bonferroni Tukey and FDR Methods 107
Chapter Notes 111
Exercises 112
4 Generalized Linear Models: Model Fitting and Inference 120
4.1 Exponential Dispersion Family Distributions for a GLM 120
4.2 Likelihood and Asymptotic Distributions for GLMs 123
4.3 Likelihood-Ratio/Wald/Score Methods of Inference for GLM Parameters 128
4.4 Deviance of a GLM Model Comparison and Model Checking 132
4.5 Fitting Generalized Linear Models 138
4.6 Selecting Explanatory Variables for a GLM 143
4.7 Example: Building a GLM 149
Appendix: GLM Analogs of Orthogonality Results for Linear Models 156
Chapter Notes 158
Exercises 159
5 Models for Binary Data 165
5.1 Link Functions for Binary Data 165
5.2 Logistic Regression: Properties and Interpretations 168
5.3 Inference About Parameters of Logistic Regression Models 172
5.4 Logistic Regression Model Fitting 176
5.5 Deviance and Goodness of Fit for Binary GLMs 179
5.6 Probit and Complementary Log-Log Models 183
5.7 Examples: Binary Data Modeling 186
Chapter Notes 193
Exercises 194
6 Multinomial Response Models 202
6.1 Nominal Responses: Baseline-Category Logit Models 203
6.2 Ordinal Responses: Cumulative Logit and Probit Models 209
6.3 Examples: Nominal and Ordinal Responses 216
Chapter Notes 223
Exercises 223
7 Models for Count Data 228
7.1 Poisson GLMs for Counts and Rates 229
7.2 Poisson/Multinomial Models for Contingency Tables 235
7.3 Negative Binomial GLMS 247
7.4 Models for Zero-Inflated Data 250
7.5 Example: Modeling Count Data 254
Chapter Notes 259
Exercises 260
8 Quasi-Likelihood Methods 268
8.1 Variance Inflation for Overdispersed Poisson and Binomial GLMs 269
8.2 Beta-Binomial Models and Quasi-Likelihood Alternatives 272
8.3 Quasi-Likelihood and Model Misspecification 278
Chapter Notes 282
Exercises 282
9 Modeling Correlated Responses 286
9.1 Marginal Models and Models with Random Effects 287
9.2 Normal Linear Mixed Models 294
9.3 Fitting and Prediction for Normal Linear Mixed Models 302
9.4 Binomial and Poisson GLMMs 307
9.5 GLMM Fitting Inference and Prediction 311
9.6 Marginal Modeling and Generalized Estimating Equations 314
9.7 Example: Modeling Correlated Survey Responses 319
Chapter Notes 322
Exercises 324
10 Bayesian Linear and Generalized Linear Modeling 333
10.1 The Bayesian Approach to Statistical Inference 333
10.2 Bayesian Linear Models 340
10.3 Bayesian Generalized Linear Models 347
10.4 Empirical Bayes and Hierarchical Bayes Modeling 351
Chapter Notes 357
Exercises 359
11 Extensions of Generalized Linear Models 364
11.1 Robust Regression and Regularization Methods for Fitting Models 365
11.2 Modeling With Large p 375
11.3 Smoothing Generalized Additive Models and Other GLM Extensions 378
Chapter Notes 386
Exercises 388
Appendix A Supplemental Data Analysis Exercises 391
Appendix B Solution Outlines for Selected Exercises 396
References 410
Author Index 427
Example Index 433
Subject Index 435
CHAPTER 1
Introduction to Linear and Generalized Linear Models
This is a book about linear models and generalized linear models. As the names suggest, the linear model is a special case of the generalized linear model. In this first chapter, we define generalized linear models, and in doing so we also introduce the linear model.
Chapters 2 and 3 focus on the linear model. Chapter 2 introduces the least squares method for fitting the model, and Chapter 3 presents statistical inference under the assumption of a normal distribution for the response variable. Chapter 4 presents analogous model-fitting and inferential results for the generalized linear model. This generalization enables us to model non-normal responses, such as categorical data and count data.
The remainder of the book presents the most important generalized linear models. Chapter 5 focuses on models that assume a binomial distribution for the response variable. These apply to binary data, such as "success" and "failure" for possible outcomes in a medical trial or "favor" and "oppose" for possible responses in a sample survey. Chapter 6 extends the models to multicategory responses, assuming a multinomial distribution. Chapter 7 introduces models that assume a Poisson or negative binomial distribution for the response variable. These apply to count data, such as observations in a health survey on the number of respondent visits in the past year to a doctor. Chapter 8 presents ways of weakening distributional assumptions in generalized linear models, introducing quasi-likelihood methods that merely focus on the mean and variance of the response distribution. Chapters 1-8 assume independent observations. Chapter 9 generalizes the models further to permit correlated observations, such as in handling multivariate responses. Chapters 1-9 use the traditional frequentist approach to statistical inference, assuming probability distributions for the response variables but treating model parameters as fixed, unknown values. Chapter 10 presents the Bayesian approach for linear models and generalized linear models, which treats the model parameters as random variables having their own distributions. The final chapter introduces extensions of the models that handle more complex situations, such as high-dimensional settings in which models have enormous numbers of parameters.
1.1 COMPONENTS OF A GENERALIZED LINEAR MODEL
The ordinary linear regression model uses linearity to describe the relationship between the mean of the response variable and a set of explanatory variables, with inference assuming that the response distribution is normal. Generalized linear models (GLMs) extend standard linear regression models to encompass non-normal response distributions and possibly nonlinear functions of the mean. They have three components.
- Random component: This specifies the response variable y and its probability distribution. The observations1 on that distribution are treated as independent.
- Linear predictor: For a parameter vector and a n × p model matrix X that contains values of p explanatory variables for the n observations, the linear predictor is Xß.
- Link function: This is a function g applied to each component of that relates it to the linear predictor,
Next we present more detail about each component of a GLM.
1.1.1 Random Component of a GLM
The random component of a GLM consists of a response variable y with independent observations (y1, ., yn) having probability density or mass function for a distribution in the exponential family. In Chapter 4 we review this family of distributions, which has several appealing properties. For example, Siyi is a sufficient statistic for its parameter, and regularity conditions (such as differentiation passing under an integral sign) are satisfied for derivations of properties such as optimal large-sample performance of maximum likelihood (ML) estimators.
By restricting GLMs to exponential family distributions, we obtain general expressions for the model likelihood equations, the asymptotic distributions of estimators for model parameters, and an algorithm for fitting the models. For now, it suffices to say that the distributions most commonly used in Statistics, such as the normal, binomial, and Poisson, are exponential family distributions.
1.1.2 Linear Predictor of a GLM
For observation i, i = 1, ., n, let xij denote the value of explanatory variable xj, j = 1, ., p. Let xi = (xi1, ., xip). Usually, we set xi1 = 1 or let the first variable have index 0 with xi0 = 1, so it serves as the coefficient of an intercept term in the model. The linear predictor of a GLM relates parameters {?i} pertaining to {E(yi)} to the explanatory variables x1, ., xp using a linear combination of them,
The labeling of Spj = 1ßjxij as a linear predictor reflects that this expression is linear in the parameters. The explanatory variables themselves can be nonlinear functions of underlying variables, such as an interaction term (e.g., xi3 = xi1xi2) or a quadratic term (e.g., xi2 = x2i1).
In matrix form, we express the linear predictor as
where , ß is the p × 1 column vector of model parameters, and is the n × p matrix of explanatory variable values {xij}. The matrix is called the model matrix. In experimental studies, it is also often called the design matrix. It has n rows, one for each observation, and p columns, one for each parameter in ß. In practice, usually p = n, the goal of model parsimony being to summarize the data using a considerably smaller number of parameters.
GLMs treat yi as random and xi as fixed. Because of this, the linear predictor is sometimes called the systematic component. In practice xi is itself often random, such as in sample surveys and other observational studies. In this book, we condition on its observed values in conducting statistical inference about effects of the explanatory variables.
1.1.3 Link Function of a GLM
The third component of a GLM, the link function, connects the random component with the linear predictor. Let µi = E(yi), i = 1, ., n. The GLM links ?i to µi by ?i = g(µi), where the link function g( · ) is a monotonic, differentiable function. Thus, g links µi to explanatory variables through the formula:
(1.1)In the exponential family representation of a distribution, a certain parameter serves as its natural parameter. This parameter is the mean for a normal distribution, the log of the odds for a binomial distribution, and the log of the mean for a Poisson distribution. The link function g that transforms µi to the natural parameter is called the canonical link. This link function, which equates the natural parameter with the linear predictor, generates the most commonly used GLMs. Certain simplifications result when the GLM uses the canonical link function. For example, the model has a concave log-likelihood function and simple sufficient statistics and likelihood equations.
1.1.4 A GLM with Identity Link Function is a "Linear Model"
The link function g(µi) = µi is called the identity link function. It has ?i = µi. A GLM that uses the identity link function is called a linear model. It equates the linear predictor to the mean itself. This GLM has
The standard version of this, which we refer to as the ordinary linear model, assumes that the observations have constant variance, called homoscedasticity. An alternative way to express the ordinary linear model is
where the "error term" ei has E(ei) = 0 and var(ei) = s2, i = 1, ., n. This is natural for the identity link and normal responses but not for most GLMs.
In summary, ordinary linear models equate the linear predictor directly to the mean of a response variable y and assume constant variance for that response. The normal linear model also assumes normality. By contrast, a GLM is an extension that equates the linear predictor to a link-function-transformed mean of y, and assumes a distribution for y that need not be normal but is in the exponential family. We next illustrate the three components of a GLM by introducing three of the most important GLMs.
1.1.5 GLMs for Normal, Binomial, and Poisson Responses
The class of GLMs includes models for continuous response variables....
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.