
Computational Statistics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions


Persons
Content
Chapter 1
Review
This chapter reviews notation and background material in mathematics, probability, and statistics. Readers may wish to skip this chapter and turn directly to Chapter 2, returning here only as needed.
1.1 Mathematical Notation
We use boldface to distinguish a vector x = (x1, . . ., xp) or a matrix M from a scalar variable x or a constant M. A vector-valued function f evaluated at x is also boldfaced, as in f(x) = (f1(x), . . ., fp(x)). The transpose of M is denoted MT.
Unless otherwise specified, all vectors are considered to be column vectors, so, for example, an n × p matrix can be written as M = (x1 . . . xn)T. Let I denote an identity matrix, and 1 and 0 denote vectors of ones and zeros, respectively.
A symmetric square matrix M is positive definite if xTMx > 0 for all nonzero vectors x. Positive definiteness is equivalent to the condition that all eigenvalues of M are positive. M is nonnegative definite or positive semidefinite if xTMx = 0 for all nonzero vectors x.
The derivative of a function f, evaluated at x, is denoted f´(x). When x = (x1, . . ., xp), the gradient of f at x is
The Hessian matrix for f at x is f´´(x) having (i, j)th element equal to d2f(x)/(dxi dxj). The negative Hessian has important uses in statistical inference.
Let J(x) denote the Jacobian matrix evaluated at x for the one-to-one mapping y = f(x). The (i, j)th element of J(x) is equal to dfi(x)/dxj.
A functional is a real-valued function on a space of functions. For example, if , then the functional T maps suitably integrable functions onto the real line.
The indicator function 1{A} equals 1 if A is true and 0 otherwise. The real line is denoted , and p-dimensional real space is .
1.2 Taylor's Theorem and Mathematical Limit Theory
First, we define standard "big oh" and "little oh" notation for describing the relative orders of convergence of functions. Let the functions f and g be defined on a common, possibly infinite interval. Let z0 be a point in this interval or a boundary point of it (i.e., -8 or 8). We require g(z) ? 0 for all z ? z0 in a neighborhood of z0. Then we say
(1.1)
if there exists a constant M such that |f(z)| = M|g(z)| as z z0. For example, , and it is understood that we are considering n 8. If , then we say
(1.2)
For example, (h) as h 0 if f is differentiable at x0.The same notation can be used for describing the convergence of a sequence{xn} as n 8, by letting f(n) = xn.
Taylor's theorem provides a polynomial approximation to a function f. Suppose f has finite (n + 1)th derivative on (a, b) and continuous nth derivative on [a, b]. Then for any x0 [a, b] distinct from x, the Taylor series expansion of f about x0 is
(1.3)
where is the jth derivative of f evaluated at x0, and
(1.4)
for some point in the interval between x and x0. As note that .
The multivariate version of Taylor's theorem is analogous. Suppose f is a real-valued function of a p-dimensional variable x, possessing continuous partial derivatives of all orders up to and including n + 1 with respect to all coordinates, in an open convex set containing x and . Then
(1.5)
where
(1.6)
and
(1.7)
for some on the line segment joining x and x0. As , note that .
The Euler-Maclaurin formula is useful in many asymptotic analyses. If f has 2n continuous derivatives in [0, 1], then
where is the jth derivative of f, and bj = Bj(0) can be determined using the recursion relation
(1.9)
initialized with B0(z) = 1. The proof of this result is based on repeated integrations by parts [376].
Finally, we note that it is sometimes desirable to approximate the derivative of a function numerically, using finite differences. For example, the ith component of the gradient of f at x can be approximated by
where is a small number and ei is the unit vector in the ith coordinate direction. Typically, one might start with, say, or 0.001 and approximate the desired derivative for a sequence of progressively smaller . The approximation will generally improve until becomes small enough that the calculation is degraded and eventually dominated by computer roundoff error introduced by subtractive cancellation. Introductory discussion of this approach and a more sophisticated Richardson extrapolation strategy for obtaining greater precision are provided in [376]. Finite differences can also be used to approximate the second derivative of f at x via
(1.11)
with similar sequential precision improvements.
1.3 Statistical Notation and Probability Distributions
We use capital letters to denote random variables, such as Y or X, and lowercase letters to represent specific realized values of random variables such as y or x. The probability density function of X is denoted f; the cumulative distribution function is F. We use the notation X ~ f(x) to mean that X is distributed with density f(x). Frequently, the dependence of f(x) on one or more parameters also will be denoted with a conditioning bar, as in f(x|a, ß). Because of the diversity of topics covered in this book, we want to be careful to distinguish when f(x|a) refers to a density function as opposed to the evaluation of that density at a point x. When the meaning is unclear from the context, we will be explicit, for example, by using f(· |a) to denote the function. When it is important to distinguish among several densities, we may adopt subscripts referring to specific random variables, so that the density functions for X and Y are fX and fY, respectively. We use the same notation for distributions of discrete random variables and in the Bayesian context.
The conditional distribution of X given that Y equals y (i.e., X|Y = y) is described by the density denoted f(x|y), or fX|Y(x|y). In this case, we write that X|Y has density f(x|Y). For notational simplicity we allow density functions to be implicitly specified by their arguments, so we may use the same symbol, say f, to refer to many distinct functions, as in the equation f(x, y|µ) = f(x|y, µ)f(y|µ). Finally, f(X) and F(X) are random variables: the evaluations of the density and cumulative distribution functions, respectively, at the random argument X.
The expectation of a random variable is denoted E{X}. Unless specifically mentioned, the distribution with respect to which an expectation is taken is the distribution of X or should be implicit from the context. To denote the probability of an...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.