Computational Statistics

Name: Computational Statistics
Brand: Wiley
Price: 123.99 EUR
Availability: OnlineOnly

Geof H. Givens Jennifer A. Hoeting(Author)

Wiley (Publisher)

2nd Edition

Published on 9. October 2012

496 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-55548-4 (ISBN)

€123.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

PREFACE xv ACKNOWLEDGMENTS xvii 1 REVIEW 1 1.1 Mathematical Notation 1 1.2 Taylor's Theorem and Mathematical Limit Theory 2 1.3 Statistical Notation and Probability Distributions 4 1.4 Likelihood Inference 9 1.5 Bayesian Inference 11 1.6 Statistical Limit Theory 13 1.7 Markov Chains 14 1.8 Computing 17 PART I OPTIMIZATION 2 OPTIMIZATION AND SOLVING NONLINEAR EQUATIONS21 2.1 Univariate Problems 22 2.2 Multivariate Problems 34 Problems 54 3 COMBINATORIAL OPTIMIZATION 59 3.1 Hard Problems and NP-Completeness 59 3.2 Local Search 65 3.3 Simulated Annealing 68 3.4 Genetic Algorithms 75 3.5 Tabu Algorithms 85 Problems 92 4 EM OPTIMIZATION METHODS 97 4.1 Missing Data, Marginalization, and Notation 97 4.2 The EM Algorithm 98 4.3 EM Variants 111 Problems 121 PART II INTEGRATION AND SIMULATION 5 NUMERICAL INTEGRATION 129 5.1 Newton-Côtes Quadrature 129 5.2 Romberg Integration 139 5.3 Gaussian Quadrature 142 5.4 Frequently Encountered Problems 146 Problems 148 6 SIMULATION AND MONTE CARLO INTEGRATION151 6.1 Introduction to the Monte Carlo Method 151 6.2 Exact Simulation 152 6.3 Approximate Simulation 163 6.4 Variance Reduction Techniques 180 Problems 195 7 MARKOV CHAIN MONTE CARLO 201 7.1 Metropolis-Hastings Algorithm 202 7.2 Gibbs Sampling 209 7.3 Implementation 218 Problems 230 8 ADVANCED TOPICS IN MCMC 237 8.1 Adaptive MCMC 237 8.2 Reversible Jump MCMC 250 8.3 Auxiliary Variable Methods 256 8.4 Other Metropolis-Hastings Algorithms 260 8.5 Perfect Sampling 264 8.6 Markov Chain Maximum Likelihood 268 8.7 Example: MCMC for Markov Random Fields 269 Problems 279 PART III BOOTSTRAPPING 9 BOOTSTRAPPING 287 9.1 The Bootstrap Principle 287 9.2 Basic Methods 288 9.3 Bootstrap Inference 292 9.4 Reducing Monte Carlo Error 302 9.5 Bootstrapping Dependent Data 303 9.6 Bootstrap Performance 315 9.7 Other Uses of the Bootstrap 316 9.8 Permutation Tests 317 Problems 319 PART IV DENSITY ESTIMATION AND SMOOTHING 10 NONPARAMETRIC DENSITY ESTIMATION 325 10.1 Measures of Performance 326 10.2 Kernel Density Estimation 327 10.3 Nonkernel Methods 341 10.4 Multivariate Methods 345 Problems 359 11 BIVARIATE SMOOTHING 363 11.1 Predictor-Response Data 363 11.2 Linear Smoothers 365 11.3 Comparison of Linear Smoothers 377 11.4 Nonlinear Smoothers 379 11.5 Confidence Bands 384 11.6 General Bivariate Data 388 Problems 389 12 MULTIVARIATE SMOOTHING 393 12.1 Predictor-Response Data 393 12.2 General Multivariate Data 413 Problems 416 DATA ACKNOWLEDGMENTS 421 REFERENCES 423 INDEX 457

Chapter 1

Review

This chapter reviews notation and background material in mathematics, probability, and statistics. Readers may wish to skip this chapter and turn directly to Chapter 2, returning here only as needed.

1.1 Mathematical Notation

We use boldface to distinguish a vector x = (x1, . . ., xp) or a matrix M from a scalar variable x or a constant M. A vector-valued function f evaluated at x is also boldfaced, as in f(x) = (f1(x), . . ., fp(x)). The transpose of M is denoted MT.

Unless otherwise specified, all vectors are considered to be column vectors, so, for example, an n × p matrix can be written as M = (x1 . . . xn)T. Let I denote an identity matrix, and 1 and 0 denote vectors of ones and zeros, respectively.

A symmetric square matrix M is positive definite if xTMx > 0 for all nonzero vectors x. Positive definiteness is equivalent to the condition that all eigenvalues of M are positive. M is nonnegative definite or positive semidefinite if xTMx = 0 for all nonzero vectors x.

The derivative of a function f, evaluated at x, is denoted f´(x). When x = (x1, . . ., xp), the gradient of f at x is

The Hessian matrix for f at x is f´´(x) having (i, j)th element equal to d2f(x)/(dxi dxj). The negative Hessian has important uses in statistical inference.

Let J(x) denote the Jacobian matrix evaluated at x for the one-to-one mapping y = f(x). The (i, j)th element of J(x) is equal to dfi(x)/dxj.

A functional is a real-valued function on a space of functions. For example, if , then the functional T maps suitably integrable functions onto the real line.

The indicator function 1{A} equals 1 if A is true and 0 otherwise. The real line is denoted , and p-dimensional real space is .

1.2 Taylor's Theorem and Mathematical Limit Theory

First, we define standard "big oh" and "little oh" notation for describing the relative orders of convergence of functions. Let the functions f and g be defined on a common, possibly infinite interval. Let z0 be a point in this interval or a boundary point of it (i.e., -8 or 8). We require g(z) ? 0 for all z ? z0 in a neighborhood of z0. Then we say

(1.1)

if there exists a constant M such that |f(z)| = M|g(z)| as z z0. For example, , and it is understood that we are considering n 8. If , then we say

(1.2)

For example, (h) as h 0 if f is differentiable at x0.The same notation can be used for describing the convergence of a sequence{xn} as n 8, by letting f(n) = xn.

Taylor's theorem provides a polynomial approximation to a function f. Suppose f has finite (n + 1)th derivative on (a, b) and continuous nth derivative on [a, b]. Then for any x0 [a, b] distinct from x, the Taylor series expansion of f about x0 is

(1.3)

where is the jth derivative of f evaluated at x0, and

(1.4)

for some point in the interval between x and x0. As note that .

The multivariate version of Taylor's theorem is analogous. Suppose f is a real-valued function of a p-dimensional variable x, possessing continuous partial derivatives of all orders up to and including n + 1 with respect to all coordinates, in an open convex set containing x and . Then

(1.5)

where

(1.6)

and

(1.7)

for some on the line segment joining x and x0. As , note that .

The Euler-Maclaurin formula is useful in many asymptotic analyses. If f has 2n continuous derivatives in [0, 1], then

(1.8)

where is the jth derivative of f, and bj = Bj(0) can be determined using the recursion relation

(1.9)

initialized with B0(z) = 1. The proof of this result is based on repeated integrations by parts [376].

Finally, we note that it is sometimes desirable to approximate the derivative of a function numerically, using finite differences. For example, the ith component of the gradient of f at x can be approximated by

(1.10)

where is a small number and ei is the unit vector in the ith coordinate direction. Typically, one might start with, say, or 0.001 and approximate the desired derivative for a sequence of progressively smaller . The approximation will generally improve until becomes small enough that the calculation is degraded and eventually dominated by computer roundoff error introduced by subtractive cancellation. Introductory discussion of this approach and a more sophisticated Richardson extrapolation strategy for obtaining greater precision are provided in [376]. Finite differences can also be used to approximate the second derivative of f at x via

(1.11)

with similar sequential precision improvements.

1.3 Statistical Notation and Probability Distributions

We use capital letters to denote random variables, such as Y or X, and lowercase letters to represent specific realized values of random variables such as y or x. The probability density function of X is denoted f; the cumulative distribution function is F. We use the notation X ~ f(x) to mean that X is distributed with density f(x). Frequently, the dependence of f(x) on one or more parameters also will be denoted with a conditioning bar, as in f(x|a, ß). Because of the diversity of topics covered in this book, we want to be careful to distinguish when f(x|a) refers to a density function as opposed to the evaluation of that density at a point x. When the meaning is unclear from the context, we will be explicit, for example, by using f(· |a) to denote the function. When it is important to distinguish among several densities, we may adopt subscripts referring to specific random variables, so that the density functions for X and Y are fX and fY, respectively. We use the same notation for distributions of discrete random variables and in the Bayesian context.

The conditional distribution of X given that Y equals y (i.e., X|Y = y) is described by the density denoted f(x|y), or fX|Y(x|y). In this case, we write that X|Y has density f(x|Y). For notational simplicity we allow density functions to be implicitly specified by their arguments, so we may use the same symbol, say f, to refer to many distinct functions, as in the equation f(x, y|µ) = f(x|y, µ)f(y|µ). Finally, f(X) and F(X) are random variables: the evaluations of the density and cumulative distribution functions, respectively, at the random argument X.

The expectation of a random variable is denoted E{X}. Unless specifically mentioned, the distribution with respect to which an expectation is taken is the distribution of X or should be implicit from the context. To denote the probability of an...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computational Statistics

Description

More details

Other editions

Additional editions

Persons

Content

1.1 Mathematical Notation

1.2 Taylor's Theorem and Mathematical Limit Theory

1.3 Statistical Notation and Probability Distributions

System requirements