Generalized Linear Models

Name: Generalized Linear Models | Problems with Censored, Missing, and Zero-inflated Data
Brand: Sybex
Price: 142.99 EUR
Availability: OnlineOnly

Problems with Censored, Missing, and Zero-inflated Data

Jean-Francois Dupuy(Autor*in)

Sybex (Verlag)

1. Auflage

Erschienen am 17. Juni 2025

322 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-394-38844-8 (ISBN)

142,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Preface ix

Notation and Acronyms xi

Chapter 1. Exponential Families 1

1.1. Definition 1

1.2. Mean, variance, and variance function 3

1.3. Examples of exponential families 4

1.4. Maximum likelihood estimation 9

1.5. Technical appendix 18

1.5.1. Some useful results from probability 18

1.5.2. Negative binomial distribution and Poisson-gamma mixtures 19

1.6. Exercises 20

Chapter2. From Linear Models to GLMs 25

2.1. Reminders on the linear model 27

2.1.1. Matrix form of the linear model 28

2.1.2. Some examples of linear models 29

2.1.3. Least-squares approximation 31

2.1.4. Asymptotics of the LSE 34

2.1.5. Linear Gaussian model 37

2.2. Three components of a generalized linear model 43

2.2.1. The random component 43

2.2.2. The linear predictor 44

2.2.3. The link function 45

2.3. Estimation in generalized linear models 46

2.3.1. Maximum likelihood. 46

2.3.2. Asymptotic properties and inference 48

2.3.3. Estimating the dispersion parameter 50

2.4. Some examples 52

2.4.1. Logistic regression model 52

2.4.2. Poisson regression model 59

2.4.3. Gamma regression model 60

2.5. Generalized linear models in R:Poisson regression example 61

2.5.1. Confidence intervals and hypothesis tests 65

2.5.2. AIC and BIC, variable selection 68

2.5.3. Prediction, confidence intervals for a prediction. 69

2.6. Technical appendix 71

2.6.1. Some probability distributions 71

2.6.2. Cochran's theorem 72

2.7. Exercises 72

Chapter3. Censored and Missing Data in GLMs 83

3.1. Censored data 83

3.1.1. Introduction. 83

3.1.2. Poisson regression with a right-censored response 85

3.1.3. Gamma regression with a right-censored response 93

3.2. Missing data problems 101

3.2.1. Introduction. 101

3.2.2. Missing data typology 102

3.2.3. Methods for treating missing data 103

3.2.4. A missing data problem in the Poisson model 113

3.2.5. A missing data problem in the gamma regression model 127

3.3. Technical appendix 135

3.3.1. Two lemmas 135

3.3.2. Proof of theorem 3.3 140

3.3.3. Proof of theorem 3.4 143

3.3.4. Proof of theorem 3.5 145

3.3.5. Proof of theorem 3.6 149

3.3.6. Elements of empirical processes 150

3.4. Exercises 154

Chapter4. Zero-Inflated Models 159

4.1. Introduction 159

4.1.1. Overdispersion 159

4.1.2. Excess of zeros 163

4.2. Zero-inflated Poisson models and extensions 166

4.2.1. The zero-inflated Poisson model 166

4.2.2. Semi-parametric ZIP models 170

4.2.3. Zero-inflated generalized Poisson model 174

4.2.4. A zero-inflation test 177

4.3. Zero-inflated negative binomial model 183

4.3.1. Negative binomial model 183

4.3.2. ZINB model 184

4.3.3. The ZIP model versus the ZINB model 186

4.4. Zero-inflated binomial model 187

4.5. Censored and missing data: examples of problems 190

4.5.1. Censored ZIP model 190

4.5.2. Missing covariables in the ZIB model 192

4.5.3. Missing covariables in the ZIP model 195

4.6. Marginal zero-inflated models 200

4.6.1. Introduction. 200

4.6.2. MZIP and MZINB models 203

4.7. Exercises 204

References 209

Index 217

1
Exponential Families

Exponential families play a central role in the construction of generalized linear models (Chapter 2). This first chapter is dedicated to them. We will limit ourselves to the results that are strictly necessary to understand the rest of the book. The interested reader will find a more detailed exposition in Sundberg (2019).

1.1. Definition

Recall that a statistical model is a pair , where is a set (called the space of observations) and is a family of probability distributions on . A statistical model is said to be parametric if the family can be described by a parameter ? that lives in a finite-dimensional vector space, typically Rp where p ? N* (or in a subspace Ø of this vector space). In the contrary case, a model is said to be non-parametric. A parametric model will be denoted in the following way:

or, more simply, if we omit :

The space Ø is called the parameter space. For example, the family of normal distributions with mean m and variance s2 constitutes a parametric model for a real observation. The family of Poisson distributions with parameter µ is a parametric model for an integer-valued observation (called count or enumeration).

A parametric statistical model (where Ø ? R) is said to be an exponential model if the distribution P?,? admits a density (with respect to a suitable dominant measure µ: the Lebesgue measure on R or a subinterval of R, or the counting measure on a countable set) of the form:

[1.1]

where a(·) and b(·, ·) are functions which determine which particular model is being considered (Poisson model, Gaussian model, etc.), as we will see in section 1.3.

The parameter ? is said to be the canonical parameter of the model (we sometimes also call it the natural parameter, but this last term is not very appropriate since the canonical parameter does not strongly correspond to the most natural parameterization of the model; see section 1.3).

The parameter ?, called the dispersion parameter, is often considered to be a nuisance parameter, with ? being the parameter of interest within the model. The family of densities is called an exponential family.

REMARK.- In the literature, we will encounter definitions of the exponential model, which uses slightly different forms of the density [1.1]. For example, in the denominator of [1.1], ? is sometimes replaced by a function c(?), which itself is often expressed in the form c(?) = ?/?, where ? is a known weight. In order to keep the notation simple, in this book, we will adopt the parameterization in [1.1], since it encompasses the most common examples of exponential families (an example in which a weight ? occurs is described in Chapter 2).

If ? is known (e.g. when ? = 1 in the binomial and Poisson distributions), we can set:

so that [1.1] becomes:

[1.2]

which is an expression for the density that is commonly used to define the exponential model. The quantity C(?) here plays the role of a normalization constant, making the function f(y; ?) into a probability density (we have ).

1.2. Mean, variance, and variance function

Let Y be a random variable with density [1.1]. Additionally, set and (we will assume that a is infinitely differentiable; see Sundberg (2019)). We will now show the following result:

PROPOSITION 1.1.- In the model [1.1], we have:

REMARK.- Since the function is continuous and strictly increasing on Ø (since ), it therefore admits a continuous inverse, .

PROOF OF PROPOSITION 1.1.- Let us set , and assume that we can interchange the integral and differential operators. To simplify the notation below, we will write in the place of and dy in the place of dµ(y).

Using this notation:

and:

since . Now, we differentiate [1.1] twice with respect to ? to obtain successively:

and:

Now, integrate these two expressions with respect to y. We obtain:

From the first equation, we immediately deduce that and, by observing that , we easily deduce from the second equation that var(Y ) = ?ä(?).

In the following, we will set , so that with this notation:

where . The function v(µ) is called the variance function. In the model [1.1], it describes the way in which the variance of Y varies as a function of its mean (this is said to be the mean-variance relation).

In the Gaussian model, we will see in section 1.3 that v(µ) = 1. Therefore, v(µ) does not depend on µ: the variance and the mean vary independently from one another. For the Poisson distribution, v(µ) = µ: the variance varies like the expected value. For the gamma distribution, v(µ) = µ2: the variance varies like the square of the expected value (the standard deviation varies like the expected value).

1.3. Examples of exponential families

Exponential families include many of the classical probability distributions. We will describe some examples below, though there exists a great deal more examples (Sundberg 2019).

EXAMPLE 1.1 (BINOMIAL DISTRIBUTION).- Let k ? N* be fixed. The family of binomial distributions with parameter p is an exponential family. Indeed, with respect to the counting measure on {0, 1, ., k}, for the distribution, we have:

which can be identified with [1.1], if we set and .

We see that:

and:

Denoting , then the variance function is equal to: v(µ) = µ(k - µ)/k.

REMARK.- The particular case when k = 1 corresponds to the Bernoulli distribution. We will denote it by .

EXAMPLE 1.2 (POISSON DISTRIBUTION).- The family of Poisson distributions with parameter µ is an exponential family. The density on N for the distribution is written as:

We identify it with [1.1] by setting ? = ln µ, ? = 1, a(?) = µ = e? and b(y, ?) = -ln y!. We see that:

We recover the equality property (called equidispersion) of the mean and variance from the Poisson distribution. Finally, the variance function v(µ) is simply equal to v(µ) = µ.

EXAMPLE 1.3 (NEGATIVE BINOMIAL DISTRIBUTION).- Let us consider a series of independent events in which a "success" occurs with constant probability p (so that "failure" occurs with complementary probability 1 - p). We repeat the events until a given number of k successes (with k ? {1, 2, .}) have occurred. The negative binomial distribution is the probability distribution of the random variable Y that counts the number of failures, which have occurred before obtaining k successes.

Its probability density is written as:

REMARK.- The particular case when k = 1 corresponds to the geometric distribution.

For each integer n, recall that G(n + 1) = n!, and set ? = 1/k and . We can then rewrite f as follows:

[1.3]

and then again, after some simple calculations:

We identify f with [1.1] by setting and . The negative binomial distributions therefore form an exponential family. We see that:

and:

The variance function is equal to:

The mean-variance relation is thus quadratic, from which comes the name "NB2 distribution" which is sometimes given to this distribution (the "2" refers to the power of µ in the variance function) (Cameron and Trivedi 1998; Hilbe 2011).

There exists another parameterization of the negative binomial distribution, in which the variance is a linear function of the mean. The corresponding distribution is hence called the "NB1 distribution".

The NB2 distribution can be obtained as a Poisson-gamma mixture (we show this in the technical appendix to this chapter; see section 1.5). This interpretation, along with the fact that , make the negative binomial distribution a very useful model for overdispersed count data.

EXAMPLE 1.4 (NORMAL DISTRIBUTION).- The family of normal distributions with mean µ and variance s2 form an exponential family. The distribution has the following density on R:

which can be identified with [1.1] if we set and . We see that:

The variance function is equal to v(µ) = 1.

EXAMPLE 1.5 (GAMMA DISTRIBUTION).- The family of gamma distributions form...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Generalized Linear Models

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

1 Exponential Families

1.1. Definition

1.2. Mean, variance, and variance function

1.3. Examples of exponential families

Systemvoraussetzungen

1
Exponential Families