Multivariate Nonparametric Regression and Visualization

Name: Multivariate Nonparametric Regression and Visualization | With R and Applications to Finance
Brand: Wiley
Price: 109.99 EUR
Availability: OnlineOnly

With R and Applications to Finance

Jussi Klemelä(Author)

Wiley (Publisher)

Published on 5. May 2014

392 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-59350-9 (ISBN)

€109.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

A modern approach to statistical learning and itsapplications through visualization methods With a unique and innovative presentation, MultivariateNonparametric Regression and Visualization provides readerswith the core statistical concepts to obtain complete and accuratepredictions when given a set of data. Focusing on nonparametricmethods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification andregression. The book then introduces and examines various tested and provenvisualization techniques for learning samples and functions.Multivariate Nonparametric Regression and Visualizationidentifies risk management, portfolio selection, and option pricingas the main areas in which statistical methods may be implementedin quantitative finance. The book provides coverage of keystatistical areas including linear methods, kernel methods,additive models and trees, boosting, support vector machines, andnearest neighbor methods. Exploring the additional applications ofnonparametric and semiparametric methods, MultivariateNonparametric Regression and Visualization features: * An extensive appendix with R-package training material toencourage duplication and modification of the presentedcomputations and research * Multiple examples to demonstrate the applications in the fieldof finance * Sections with formal definitions of the various applied methodsfor readers to utilize throughout the book Multivariate Nonparametric Regression and Visualizationis an ideal textbook for upper-undergraduate and graduate-levelcourses on nonparametric function estimation, advanced topics instatistics, and quantitative finance. The book is also an excellentreference for practitioners who apply statistical methods inquantitative finance.

Reviews / Votes

"Altogether, the book provides a very nice overview of nonparametric and semiparametric regression methods with interesting applications to problems in quantitative finance." (Mathematical Reviews, 1 October 2015)

More details

Other editions

Person

Content

Preface xvii Introduction xix I.1 Estimation of Functionals of Conditional Distributions xx I.2 Quantitative Finance xxi I.3 Visualization xxi I.4 Literature xxiii PART I METHODS OF REGRESSION AND CLASSIFICATION 1 Overview of Regression and Classification 3 1.1 Regression 3 1.2 Discrete Response Variable 29 1.3 Parametric Family Regression 33 1.4 Classification 37 1.5 Applications in Quantitative Finance 42 1.6 Data Examples 52 1.7 Data Transformations 53 1.8 Central Limit Theorems 58 1.9 Measuring the Performance of Estimators 61 1.10 Confidence Sets 73 1.11 Testing 75 2 Linear Methods and Extensions 77 2.1 Linear Regression 78 2.2 Varying Coefficient Linear Regression 97 2.3 Generalized Linear and Related Models 102 2.4 Series Estimators 107 2.5 Conditional Variance and ARCH models 111 2.6 Applications in Volatility and Quantile Estimation 115 2.7 Linear Classifiers 124 3 Kernel Methods and Extensions 127 3.1 Regressogram 129 3.2 Kernel Estimator 130 3.3 Nearest Neighborhood Estimator 147 3.4 Classification with Local Averaging 148 3.5 Median Smoothing 151 3.6 Conditional Density Estimators 152 3.7 Conditional Distribution Function Estimation 158 3.8 Conditional Quantile Estimation 160 3.9 Conditional Variance Estimation 162 3.10 Conditional Covariance Estimation 176 3.11 Applications in Risk Management 181 3.12 Applications in Portfolio Selection 205 4 Semiparametric and Structural Models 229 4.1 Single Index Model 230 4.2 Additive Model 234 4.3 Other Semiparametric Models 237 5 Empirical Risk Minimization 241 5.1 Empirical Risk 243 5.2 Local Empirical Risk 247 5.3 Support Vector Machines 257 5.4 Stagewise Methods 259 5.5 Adaptive Regressograms 264 PART II VISUALIZATION 6 Visualization of Data 277 6.1 Scatter Plots 278 6.2 Histogram and Kernel Density Estimator 282 6.3 Dimension Reduction 284 6.4 Observations as Objects 288 7 Visualization of Functions 295 7.1 Slices 296 7.2 Partial Dependence Functions 296 7.3 Reconstruction of Sets 299 7.4 Level Set Trees 303 7.5 Unimodal Densities 326 7.5.1 Probability Content of Level Sets 327 7.5.2 Set Visualization 328 Appendix A: R Tutorial 329 A.1 Data Visualization 329 A.2 Linear Regression 331 A.3 Kernel Regression 332 A.4 Local Linear Regression 341 A.5 Additive Models: Backfitting 344 A.6 Single Index Regression 345 A.7 Forward Stagewise Modeling 347 A.8 Quantile Regression 349 References 351 Author Index 361 Topic Index 365

INTRODUCTION

We study regression analysis and classification, as well as estimation of conditional variances, quantiles, densities, and distribution functions. The focus of the book is on nonparametric methods. Nonparametric methods are flexible and able to adapt to various kinds of data, but they can suffer from the curse of dimensionality and from the lack of interpretability. Semiparametric methods are often able to cope with quite high-dimensional data and they are often easier to interpret, but they are less flexible and their use may lead to modeling errors. In addition to terms “nonparametric estimator” and “semiparametric estimator”, we can use the term “structured estimator” to denote such estimators that arise, for example, in additive models. These estimators obey a structural restriction, whereas the term “semiparametric estimator” is used for estimators that have a parametric and a nonparametric component.

Nonparametric, semiparametric, and structured methods are well established and widely applied. There are, nevertheless, areas where a further work is useful. We have included three such areas in this book:

1. Estimation of several functionals of a conditional distribution; not only estimation of the conditional expectation but also estimation of the conditional variance and conditional quantiles. 2. Quantitative finance as an area of application for nonparametric and semiparametric methods. 3. Visualization tools in statistical learning.

I.1 ESTIMATION OF FUNCTIONALS OF CONDITIONAL DISTRIBUTIONS

One of the main topics of the book are the kernel methods. Kernel methods are easy to implement and computationally feasible, and their definition is intuitive. For example, a kernel regression estimator is a local average of the values of the response variable. Local averaging is a general regression method. In addition to the kernel estimator, examples of local averaging include the nearest-neighbor estimator, the regressogram, and the orthogonal series estimator.

We cover linear regression and generalized linear models. These models can be seen as starting points to many semiparametric and structured regression models. For example, the single index model, the additive model, and the varying coefficient linear regression model can be seen as generalizations of the linear regression model or the generalized linear model.

Empirical risk minimization is a general approach to statistical estimation. The methods of empirical risk minimization can be used in regression function estimation, in classification, in quantile regression, and in the estimation of other functionals of the conditional distribution. The method of local empirical risk minimization is a method which can be seen as a generalization of the kernel regression.

A regular regressogram is a special case of local averaging, but the empirical choice of the partition leads to a rich class of estimators. The choice of the partition is made using empirical risk minimization. In the one- and two-dimensional cases a regressogram is usually less efficient than the kernel estimator, but in high-dimensional cases a regressogram can be useful. For example, a method to select the partition of a regressogram can be seen as a method of variable selection, if the chosen partition is such that it can be defined using only a subset of the variables. The estimators that are defined as a solution of an optimization problem, like the minimizers of an empirical risk, need typically be calculated with numerical methods. Stagewise algorithms can also be taken as a definition of an estimator, even without giving an explicit minimization problem which they solve.

A regression function is defined as the conditional expectation of the distribution of a response variable. The conditional expectation is useful in making predictions as well as in finding causal relationships. We cover also the estimation of the conditional variance and conditional quantiles. These are needed to give a more complete view of the conditional distribution. Also, the estimation of the conditional variance and conditional quantiles is needed in risk management, which is an important area of quantitative finance. The conditional variance can be estimated by estimating the conditional expectation of the squared random variable, whereas a conditional quantile is a special case of the conditional median. In the time series setting the standard approaches for estimating the conditional variance are the ARCH and GARCH modeling, but we discuss nonparametric alternatives. The GARCH estimator is close to a moving average, whereas the ARCH estimator is related to linear state space modeling.

In classification we are not interested in the estimation of functionals of a distribution, but the aim is to construct classification rules. However, most of the regression function estimation methods have a counterpart in classification.

I.2 QUANTITATIVE FINANCE

Risk management, portfolio selection, and option pricing can be identified as three important areas of quantitative finance. Parametric statistical methods have been dominating the statistical research in quantitative finance. In risk management, probability distributions have been modeled with the Pareto distribution or with distributions derived from the extreme value theory. In portfolio selection the multivariate normal model has been used together with the Markowitz theory of portfolio selection. In option pricing the Black-Scholes model of stock prices has been widely applied. The Black-Scholes model has also been extended to more general parametric models for the process of stock prices.

In risk management the p-quantile of a loss distribution has a direct interpretation as such threshold that the probability of the loss exceeding the threshold is less than p. Thus estimation of conditional quantiles is directly relevant for risk management. Unconditional quantile estimators do not take into account all available information, and thus in risk management it is useful to estimate conditional quantiles. The estimation of the conditional variance can be applied in the estimation of a conditional quantile, because in location-scale families the variance determines the quantiles. The estimation of conditional variance can be extended to the estimation of the conditional covariance or the conditional correlation.

We apply nonparametric regression function estimation in portfolio selection. The portfolio is selected either with the maximization of a conditional expected utility or with the maximization of a Markowitz criterion. When the collection of allowed portfolio weights is a finite set, then also classification can be used in portfolio selection. The squared returns are much easier to predict than the returns themselves, and thus in quantitative finance the focus has been in the prediction of volatility. However, it can be shown that despite the weak predictability of the returns, portfolio selection can profit from statistical prediction.

Option pricing can be formulated as a problem of stochastic control. We do not study the statistics of option pricing in detail, but give a basic framework for solving some option pricing problems nonparametrically.

I.3 VISUALIZATION

Statistical visualization is often considered as a visualization of the raw data. The visualization of the raw data can be a part of the exploratory data analysis, a first step to model building, and a tool to generate hypotheses about the data-generating mechanism. However, we put emphasis on a different approach to visualization. In this approach, visualization tools are associated with statistical estimators or inference procedures. For example, we estimate first a regression function and then try to visualize and describe the properties of this regression function estimate. The distinction between the visualization of the raw data and the visualization of the estimator is not clear when nonparametric function estimation is used. In fact, nonparametric function estimation can be seen as a part of exploratory data analysis.

The SiZer is an example of a tool that combines visualization and inference, see Chaudhuri & Marron (1999). This methodology combines formal testing for the existence of modes with the SiZer maps to find out whether a mode of a density estimate of a regression function estimate is really there.

Semiparametric function estimates are often easier to visualize than nonparametric function estimates. For example, in a single index model the regression function estimate is a composition of a linear function and a univariate function. Thus in a single index model we need only to visualize the coefficients of the linear function and a one-dimensional function. The ease of visualization gives motivation to study semiparametric methods.

CART, as presented in Breiman, Friedman, Olshen & Stone (1984), is an example of an estimation method whose popularity is not only due to its statistical properties but also because it is defined in terms of a binary tree that gives directly a visualization of the estimator. Even when it is possible to find estimators with better statistical properties than CART, the possibility to visualization gives motivation to use CART.

Visualization of nonparametric function estimates, such as kernel estimates, is challenging. For the visualization of completely nonparametric estimates, we can use level set tree-based methods, as presented in Klemelä (2009). Level set tree-based methods have found interest also in topological data analysis and in scientific visualization, and these methods have their origin in the concept of a Reeb graph,...

Content (EPUB)