Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Since they were first formulated in 1972, generalized linear models have enjoyed a veritable boom, with numerous applications in insurance, economics and biostatistics. Today, they are still the subject of a great deal of research.
This book provides an overview of the theory of generalized linear models. Particular attention is paid to the problems of censoring, missing data and excess zeros. Didactic and accessible, Generalized Linear Models is illustrated with exercises and numerous R codes.
With all the necessary prerequisites introduced in a step-by-step fashion, this book is aimed at students (at master's or engineering school level), as well as teachers and practitioners of mathematics and statistical modeling.
Jean-François Dupuy is Professor of Applied Mathematics at the University of Rennes and is a member of the Institut de recherche mathématique de Rennes, France. His research focuses on statistical modeling, generalized linear models and duration models.
Preface ix
Notation and Acronyms xi
Chapter 1. Exponential Families 1
1.1. Definition 1
1.2. Mean, variance, and variance function 3
1.3. Examples of exponential families 4
1.4. Maximum likelihood estimation 9
1.5. Technical appendix 18
1.5.1. Some useful results from probability 18
1.5.2. Negative binomial distribution and Poisson-gamma mixtures 19
1.6. Exercises 20
Chapter2. From Linear Models to GLMs 25
2.1. Reminders on the linear model 27
2.1.1. Matrix form of the linear model 28
2.1.2. Some examples of linear models 29
2.1.3. Least-squares approximation 31
2.1.4. Asymptotics of the LSE 34
2.1.5. Linear Gaussian model 37
2.2. Three components of a generalized linear model 43
2.2.1. The random component 43
2.2.2. The linear predictor 44
2.2.3. The link function 45
2.3. Estimation in generalized linear models 46
2.3.1. Maximum likelihood. 46
2.3.2. Asymptotic properties and inference 48
2.3.3. Estimating the dispersion parameter 50
2.4. Some examples 52
2.4.1. Logistic regression model 52
2.4.2. Poisson regression model 59
2.4.3. Gamma regression model 60
2.5. Generalized linear models in R:Poisson regression example 61
2.5.1. Confidence intervals and hypothesis tests 65
2.5.2. AIC and BIC, variable selection 68
2.5.3. Prediction, confidence intervals for a prediction. 69
2.6. Technical appendix 71
2.6.1. Some probability distributions 71
2.6.2. Cochran's theorem 72
2.7. Exercises 72
Chapter3. Censored and Missing Data in GLMs 83
3.1. Censored data 83
3.1.1. Introduction. 83
3.1.2. Poisson regression with a right-censored response 85
3.1.3. Gamma regression with a right-censored response 93
3.2. Missing data problems 101
3.2.1. Introduction. 101
3.2.2. Missing data typology 102
3.2.3. Methods for treating missing data 103
3.2.4. A missing data problem in the Poisson model 113
3.2.5. A missing data problem in the gamma regression model 127
3.3. Technical appendix 135
3.3.1. Two lemmas 135
3.3.2. Proof of theorem 3.3 140
3.3.3. Proof of theorem 3.4 143
3.3.4. Proof of theorem 3.5 145
3.3.5. Proof of theorem 3.6 149
3.3.6. Elements of empirical processes 150
3.4. Exercises 154
Chapter4. Zero-Inflated Models 159
4.1. Introduction 159
4.1.1. Overdispersion 159
4.1.2. Excess of zeros 163
4.2. Zero-inflated Poisson models and extensions 166
4.2.1. The zero-inflated Poisson model 166
4.2.2. Semi-parametric ZIP models 170
4.2.3. Zero-inflated generalized Poisson model 174
4.2.4. A zero-inflation test 177
4.3. Zero-inflated negative binomial model 183
4.3.1. Negative binomial model 183
4.3.2. ZINB model 184
4.3.3. The ZIP model versus the ZINB model 186
4.4. Zero-inflated binomial model 187
4.5. Censored and missing data: examples of problems 190
4.5.1. Censored ZIP model 190
4.5.2. Missing covariables in the ZIB model 192
4.5.3. Missing covariables in the ZIP model 195
4.6. Marginal zero-inflated models 200
4.6.1. Introduction. 200
4.6.2. MZIP and MZINB models 203
4.7. Exercises 204
References 209
Index 217
Exponential families play a central role in the construction of generalized linear models (Chapter 2). This first chapter is dedicated to them. We will limit ourselves to the results that are strictly necessary to understand the rest of the book. The interested reader will find a more detailed exposition in Sundberg (2019).
Recall that a statistical model is a pair , where is a set (called the space of observations) and is a family of probability distributions on . A statistical model is said to be parametric if the family can be described by a parameter ? that lives in a finite-dimensional vector space, typically Rp where p ? N* (or in a subspace Ø of this vector space). In the contrary case, a model is said to be non-parametric. A parametric model will be denoted in the following way:
or, more simply, if we omit :
The space Ø is called the parameter space. For example, the family of normal distributions with mean m and variance s2 constitutes a parametric model for a real observation. The family of Poisson distributions with parameter µ is a parametric model for an integer-valued observation (called count or enumeration).
A parametric statistical model (where Ø ? R) is said to be an exponential model if the distribution P?,? admits a density (with respect to a suitable dominant measure µ: the Lebesgue measure on R or a subinterval of R, or the counting measure on a countable set) of the form:
where a(·) and b(·, ·) are functions which determine which particular model is being considered (Poisson model, Gaussian model, etc.), as we will see in section 1.3.
The parameter ? is said to be the canonical parameter of the model (we sometimes also call it the natural parameter, but this last term is not very appropriate since the canonical parameter does not strongly correspond to the most natural parameterization of the model; see section 1.3).
The parameter ?, called the dispersion parameter, is often considered to be a nuisance parameter, with ? being the parameter of interest within the model. The family of densities is called an exponential family.
REMARK.- In the literature, we will encounter definitions of the exponential model, which uses slightly different forms of the density [1.1]. For example, in the denominator of [1.1], ? is sometimes replaced by a function c(?), which itself is often expressed in the form c(?) = ?/?, where ? is a known weight. In order to keep the notation simple, in this book, we will adopt the parameterization in [1.1], since it encompasses the most common examples of exponential families (an example in which a weight ? occurs is described in Chapter 2).
If ? is known (e.g. when ? = 1 in the binomial and Poisson distributions), we can set:
so that [1.1] becomes:
which is an expression for the density that is commonly used to define the exponential model. The quantity C(?) here plays the role of a normalization constant, making the function f(y; ?) into a probability density (we have ).
Let Y be a random variable with density [1.1]. Additionally, set and (we will assume that a is infinitely differentiable; see Sundberg (2019)). We will now show the following result:
PROPOSITION 1.1.- In the model [1.1], we have:
REMARK.- Since the function is continuous and strictly increasing on Ø (since ), it therefore admits a continuous inverse, .
PROOF OF PROPOSITION 1.1.- Let us set , and assume that we can interchange the integral and differential operators. To simplify the notation below, we will write in the place of and dy in the place of dµ(y).
Using this notation:
and:
since . Now, we differentiate [1.1] twice with respect to ? to obtain successively:
Now, integrate these two expressions with respect to y. We obtain:
From the first equation, we immediately deduce that and, by observing that , we easily deduce from the second equation that var(Y ) = ?ä(?).
In the following, we will set , so that with this notation:
where . The function v(µ) is called the variance function. In the model [1.1], it describes the way in which the variance of Y varies as a function of its mean (this is said to be the mean-variance relation).
In the Gaussian model, we will see in section 1.3 that v(µ) = 1. Therefore, v(µ) does not depend on µ: the variance and the mean vary independently from one another. For the Poisson distribution, v(µ) = µ: the variance varies like the expected value. For the gamma distribution, v(µ) = µ2: the variance varies like the square of the expected value (the standard deviation varies like the expected value).
Exponential families include many of the classical probability distributions. We will describe some examples below, though there exists a great deal more examples (Sundberg 2019).
EXAMPLE 1.1 (BINOMIAL DISTRIBUTION).- Let k ? N* be fixed. The family of binomial distributions with parameter p is an exponential family. Indeed, with respect to the counting measure on {0, 1, ., k}, for the distribution, we have:
which can be identified with [1.1], if we set and .
We see that:
Denoting , then the variance function is equal to: v(µ) = µ(k - µ)/k.
REMARK.- The particular case when k = 1 corresponds to the Bernoulli distribution. We will denote it by .
EXAMPLE 1.2 (POISSON DISTRIBUTION).- The family of Poisson distributions with parameter µ is an exponential family. The density on N for the distribution is written as:
We identify it with [1.1] by setting ? = ln µ, ? = 1, a(?) = µ = e? and b(y, ?) = -ln y!. We see that:
We recover the equality property (called equidispersion) of the mean and variance from the Poisson distribution. Finally, the variance function v(µ) is simply equal to v(µ) = µ.
EXAMPLE 1.3 (NEGATIVE BINOMIAL DISTRIBUTION).- Let us consider a series of independent events in which a "success" occurs with constant probability p (so that "failure" occurs with complementary probability 1 - p). We repeat the events until a given number of k successes (with k ? {1, 2, .}) have occurred. The negative binomial distribution is the probability distribution of the random variable Y that counts the number of failures, which have occurred before obtaining k successes.
Its probability density is written as:
REMARK.- The particular case when k = 1 corresponds to the geometric distribution.
For each integer n, recall that G(n + 1) = n!, and set ? = 1/k and . We can then rewrite f as follows:
and then again, after some simple calculations:
We identify f with [1.1] by setting and . The negative binomial distributions therefore form an exponential family. We see that:
The variance function is equal to:
The mean-variance relation is thus quadratic, from which comes the name "NB2 distribution" which is sometimes given to this distribution (the "2" refers to the power of µ in the variance function) (Cameron and Trivedi 1998; Hilbe 2011).
There exists another parameterization of the negative binomial distribution, in which the variance is a linear function of the mean. The corresponding distribution is hence called the "NB1 distribution".
The NB2 distribution can be obtained as a Poisson-gamma mixture (we show this in the technical appendix to this chapter; see section 1.5). This interpretation, along with the fact that , make the negative binomial distribution a very useful model for overdispersed count data.
EXAMPLE 1.4 (NORMAL DISTRIBUTION).- The family of normal distributions with mean µ and variance s2 form an exponential family. The distribution has the following density on R:
which can be identified with [1.1] if we set and . We see that:
The variance function is equal to v(µ) = 1.
EXAMPLE 1.5 (GAMMA DISTRIBUTION).- The family of gamma distributions form...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.