Robust Statistics

Name: Robust Statistics | Theory and Methods (with R)
Brand: Wiley
Price: 78.99 EUR
Availability: OnlineOnly

Theory and Methods (with R)

Ricardo A. Maronna R. Douglas Martin Victor J. Yohai Matías Salibián-Barrera(Author)

Wiley (Publisher)

2nd Edition

Published on 19. October 2018

464 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-21466-3 (ISBN)

€78.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

A new edition of this popular text on robust statistics, thoroughly updated to include new and improved methods and focus on implementation of methodology using the increasingly popular open-source software R.

Classical statistics fail to cope well with outliers associated with deviations from standard distributions. Robust statistical methods take into account these deviations when estimating the parameters of parametric models, thus increasing the reliability of fitted models and associated inference. This new, second edition of Robust Statistics: Theory and Methods (with R) presents a broad coverage of the theory of robust statistics that is integrated with computing methods and applications. Updated to include important new research results of the last decade and focus on the use of the popular software package R, it features in-depth coverage of the key methodology, including regression, multivariate analysis, and time series modeling. The book is illustrated throughout by a range of examples and applications that are supported by a companion website featuring data sets and R code that allow the reader to reproduce the examples given in the book.

Unlike other books on the market, Robust Statistics: Theory and Methods (with R) offers the most comprehensive, definitive, and up-to-date treatment of the subject. It features chapters on estimating location and scale; measuring robustness; linear regression with fixed and with random predictors; multivariate analysis; generalized linear models; time series; numerical algorithms; and asymptotic theory of M-estimates.

Explains both the use and theoretical justification of robust methods
Guides readers in selecting and using the most appropriate robust methods for their problems
Features computational algorithms for the core methods

Robust statistics research results of the last decade included in this 2^nd edition include: fast deterministic robust regression, finite-sample robustness, robust regularized regression, robust location and scatter estimation with missing data, robust estimation with independent outliers in variables, and robust mixed linear models.

Robust Statistics aims to stimulate the use of robust methods as a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. It is an ideal resource for researchers, practitioners, and graduate students in statistics, engineering, computer science, and physical and social sciences.

More details

Other editions

Content

Preface xv

Preface to the First Edition xxi

About the Companion Website xxix

1 Introduction 1

2 Location and Scale 17

3 Measuring Robustness 51

4 Linear Regression 1 87

5 Linear Regression 2 115

6 Multivariate Analysis 195

7 Generalized Linear Models 271

8 Time Series 293

9 Numerical Algorithms 363

10 Asymptotic Theory of M-estimators 373

11 Description of Datasets 401

References 407

Index 423

Preface

It has now been eleven years since the publication of the first edition of Robust Statistics: Theory and Methods in 2006. Since that time, there have been two developments prompting the need for a second edition. The first development is that since 2006 a number of new results in the theory and methods of robust statistics have been developed and published, in particular by the book's authors. The second development is that the S-PLUS software has been superseded by the open source package R, so our original of the S-PLUS robust statistics package became outdated. Thus, for this second edition, we have created a new R-based package called RobStatTM, and in that package and at the publisher's web site we provide scripts for computing all the examples in the book.

We will now discuss the main research advances included in this second edition.

Finite-sample robustness

Asymptotically normal robust estimators have tuning constants that allow users to control their normal distribution variance efficiency, in a trade-off with robustness toward fat-tailed non-normal distributions. The resulting finite-sample performance in terms of mean-squared error (MSE), which takes into account bias as well as variance, can be considerably worse than implied by the asymptotic performance. This second edition contains useful new results concerning the finite-sample MSE performance of robust linear regression and robust covariance estimators. These are briefly described below.

Linear regression estimators

A loss function with optimality properties is introduced in Section 5.8.1, and it is shown that its use gives much better results than the popular bisquare function, in both efficiency and robustness.

Section 5.9.3 focuses on finite-sample efficiency and robustness and introduces a new "distance-constrained maximum-likelihood" (DCML) estimator. The DCML estimator is shown to provide the best trade-off between finite-sample robustness and normal distribution efficiency, in comparison with an MM estimator that is asymptotically 85% efficient, and an adaptive estimator, described in Section 5.9.2, that is asymptotically fully efficient for normal distributions.

Multivariate location and scatter

A number of proposed robust covariance matrix estimators were discussed in the first edition, and some comments about the choice of estimator were made. In this second edition, the new Section 6.10 "Choosing a location/scatter estimator" replaces the previous Section 6.8, and this new section provides new recommendations for choosing a robust covariance matrix estimator, based on extensive finite-sample performance simulation studies.

Fast and reliable starting points for initial estimators

The standard starting point for computing initial S-estimators for linear regression and covariance matrix estimators is based on a subsampling algorithm. Subsampling algorithms have two disadvantages: the first is that their computation time increases exponentially with the number of variables. The second disadvantage of the subsampling method is that the method is stochastic, which means that different final S-estimators and MM-estimators can occur when the computation is repeated.

Linear regression

Section 5.7.4 describes a deterministic algorithm due to Peña and Yohai (1999) for obtaining a starting point for robust regression. Since this algorithm is deterministic, it always yields the same final MM-estimator. This is particularly important in some applications, for example in financial risk calculations. Furthermore, it is shown in Section 5.7.6 that the Peña-Yohai starting-value algorithm is much faster than the subsampling method, and has smaller maximum MSE than the subsampling algorithm, sometimes substantially so.

Multivariate location and scatter

Subsampling methods have also been used to get starting values for robust estimators of location and dispersion (scatter), but they have a similar difficulty as in linear regression, namely that they will be too slow when the number of variables is large. Fortunately, there is an improved algorithm for computing starting values due to Peña and Prieto (2007), which makes use of finding projection directions of maximum and minimum kurtosis plus a set of random directions obtained by a "stratified sampling" procedure. This method, which is referred to as the KSD method, is described in Section 6.9.2. While the KSD method is still stochastic in nature, it provides fast reliable starting values, and is more stable than ordinary subsampling, as is discussed in Sections 6.10.2 and 6.10.3.

Robust regularized regression

The use of penalized regression estimators to obtain good results for high-dimensional but sparse predictor variables has been a hot topic in the "machine learning" literature over the last decade or so. These estimators add L1 and L2 penalties to the least squares objective function; the leading estimators of this type are Lasso regression, Least Angle Regression, and Elastic Net regression, among others. A new section on robust regularized regression describes how to extend robust linear model regression to obtain robust versions of the above non-robust least-squares-based regularized regression estimators.

Multivariate location and scatter estimation with missing data

Section 6.12 provides a method for solving the problem of robust estimation of scatter and location with missing data. The method contains two main components. The first is the introduction of a generalized S-estimator of scatter and location that depends on Mahalanobis distances for the non-missing data in each observation. The second component is a weighted version of the well-known expectation-maximization (EM) algorithm for missing data.

Robust estimation with independent outliers in variables

The Tukey-Huber outlier-generating family of distribution models has been a commonly accepted standard model for robust statistics research and associated empirical studies for independent and identically distributed data. In the case of multivariate data, the Tukey-Huber model describes the distribution of the rows, or "cases", of a data matrix whose columns represent variables; outliers generated by this model are known as "case outliers". However, there are important problems where outliers occur independently across cells - that is, across variables - in each row of a data matrix. For example with portfolios of stock returns, where the columns represent different stocks and the rows represent observations at different times, outlier returns in different stocks (representing idiosyncratic risk) occur independently across stocks; that is, across cells/variables.

Section 6.13 discusses an important and relatively recent model for generating independent outliers across cells (across variables), called the independent contamination (IC) model. It turns out that estimators that have good robustness properties under the Tukey-Huber model have very poor robustness properties for the IC model. For example, estimators that have high breakdown points under the Tukey-Huber model can have very low breakdown points under the IC model. This section surveys the current state of research on robust methods for IC models, and on robust methods for simultaneously dealing with outliers from both Tukey-Huber and IC models. The problem of obtaining robust estimators that work well for both Tukey-Huber and IC models is an important ongoing area of research.

Mixed linear models

Section 6.15 discusses robust methods for mixed linear models. Two primary methods are discussed, the first of which is an S-estimator method that has good robustness properties for Tukey-Huber model case-wise outliers, but does not perform well for cell-wise independent outliers. The second method is designed to do well for both types of outliers, and achieves a breakdown point of 50% for Tukey-Huber models and 29% for IC models.

Generalized linear models

New material on a family of robust estimators has been added to the chapter on generalized linear models (GLMs). These estimators are based on using M-estimators after a variance-stabilizing transformation has been applied to the response variable.

Regularized robust estimators of the inverse covariance matrix

In Chapter 6, on multivariate analysis, Section 6.14 looks at regularizing robust estimators of inverse covariance matrices in situations where the ratio of the number of variables to the number of cases is closer to or larger than one.

A note on software and book web site

The section on "Recommendations and software"at the end of each chapter indicates the procedures recommended by the authors and the R functions that implement them. These functions are located in several libraries, in particular the R package RobStatTM, which was especially developed for this book. All are available in the CRAN network (https://cran.r-project.org).

The R scripts and datasets that enable the reader to reproduce the book's examples are available at the book's web site at...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Robust Statistics

Description

More details

Other editions

Additional editions

Content

Preface

Finite-sample robustness

Linear regression estimators

Multivariate location and scatter

Fast and reliable starting points for initial estimators

Linear regression

Multivariate location and scatter

Robust regularized regression

Multivariate location and scatter estimation with missing data

Robust estimation with independent outliers in variables

Mixed linear models

Generalized linear models

Regularized robust estimators of the inverse covariance matrix

A note on software and book web site

System requirements