Nonparametric Statistics with Applications to Science and Engineering with R

Name: Nonparametric Statistics with Applications to Science and Engineering with R
Brand: Wiley
Price: 110.99 EUR
Availability: OnlineOnly

Paul Kvam Brani Vidakovic Seong-joon Kim(Author)

Wiley (Publisher)

2nd Edition

Published on 6. October 2022

448 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-26816-1 (ISBN)

€110.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

NONPARAMETRIC STATISTICS WITH APPLICATIONS TO SCIENCE AND ENGINEERING WITH R

Introduction to the methods and techniques of traditional and modern nonparametric statistics, incorporating R code

Nonparametric Statistics with Applications to Science and Engineering with R presents modern nonparametric statistics from a practical point of view, with the newly revised edition including custom R functions implementing nonparametric methods to explain how to compute them and make them more comprehensible.

Relevant built-in functions and packages on CRAN are also provided with a sample code. R codes in the new edition not only enable readers to perform nonparametric analysis easily, but also to visualize and explore data using R's powerful graphic systems, such as ggplot2 package and R base graphic system.

The new edition includes useful tables at the end of each chapter that help the reader find data sets, files, functions, and packages that are used and relevant to the respective chapter. New examples and exercises that enable readers to gain a deeper insight into nonparametric statistics and increase their comprehension are also included.

Some of the sample topics discussed in Nonparametric Statistics with Applications to Science and Engineering with R include:

* Basics of probability, statistics, Bayesian statistics, order statistics, Kolmogorov-Smirnov test statistics, rank tests, and designed experiments

* Categorical data, estimating distribution functions, density estimation, least squares regression, curve fitting techniques, wavelets, and bootstrap sampling

* EM algorithms, statistical learning, nonparametric Bayes, WinBUGS, properties of ranks, and Spearman coefficient of rank correlation

* Chi-square and goodness-of-fit, contingency tables, Fisher exact test, MC Nemar test, Cochran's test, Mantel-Haenszel test, and Empirical Likelihood

Nonparametric Statistics with Applications to Science and Engineering with R is a highly valuable resource for graduate students in engineering and the physical and mathematical sciences, as well as researchers who need a more comprehensive, but succinct understanding of modern nonparametric statistical methods.

More details

Other editions

Persons

Content

Preface xi

1 Introduction 1

1.1 Efficiency of Nonparametric Methods 2

1.2 Overconfidence Bias 4

1.3 Computing with R 5

1.4 Exercises 6

References 7

2 Probability Basics 9

2.1 Helpful Functions 10

2.2 Events, Probabilities and Random Variables 12

2.3 Numerical Characteristics of Random Variables 13

2.4 Discrete Distributions 14

2.5 Continuous Distributions 18

2.6 Mixture Distributions 24

2.7 Exponential Family of Distributions 26

2.8 Stochastic Inequalities 26

2.9 Convergence of Random Variables 28

2.10 Exercises 32

References 34

3 Statistics Basics 35

3.1 Estimation 36

3.2 Empirical Distribution Function 36

3.3 Statistical Tests 38

3.4 Confidence Intervals 41

3.5 Likelihood 45

3.6 Exercises 49

References 51

4 Bayesian Statistics 53

4.1 The Bayesian Paradigm 53

4.2 Ingredients for Bayesian Inference 54

4.3 Point Estimation 58

4.4 Interval Estimation: Credible Sets 60

4.5 Bayesian Testing 62

4.6 Bayesian Prediction 65

4.7 Bayesian Computation and Use of WinBUGS 67

4.8 Exercises 69

References 73

5 Order Statistics 75

5.1 Joint Distributions of Order Statistics 77

5.2 Sample Quantiles 79

5.3 Tolerance Intervals 79

5.4 Asymptotic Distributions of Order Statistics 81

5.5 Extreme Value Theory 82

5.6 Ranked Set Sampling 83

5.7 Exercises 84

References 87

6 Goodness of Fit 89

6.1 KolmogorovSmirnov Test Statistic 90

6.2 Smirnov Test to Compare Two Distributions 96

6.3 Specialized Tests 99

6.4 Probability Plotting 106

6.5 Runs Test 112

6.6 Meta Analysis 117

6.7 Exercises 121

References 125

7 Rank Tests 127

7.1 Properties of Ranks 128

7.2 Sign Test 130

7.3 Spearman Coefficient of Rank Correlation 135

7.4 Wilcoxon Signed Rank Test 139

7.5 Wilcoxon (TwoSample) Sum Rank Test 142

7.6 MannWhitney U Test 144

7.7 Test of Variances 146

7.8 Walsh Test for Outliers 147

7.9 Exercises 148

References 153

8 Designed Experiments 155

8.1 KruskalWallis Test 156

8.2 Friedman Test 160

8.3 Variance Test for Several Populations 165

8.4 Exercises 166

References 169

9 Categorical Data 171

9.1 ChiSquare and GoodnessofFit 172

9.2 Contingency Tables 178

9.3 Fisher Exact Test 183

9.4 Mc Nemar Test 184

9.5 Cochran's Test 186

9.6 MantelHaenszel Test 188

9.7 CLT for Multinomial Probabilities 190

9.8 Simpson's Paradox 191

9.9 Exercises 193

References 200

10 Estimating Distribution Functions 203

10.1 Introduction 203

10.2 Nonparametric Maximum Likelihood 204

10.3 KaplanMeier Estimator 205

10.4 Confidence Interval for F 213

10.5 Plugin Principle 214

10.6 SemiParametric Inference 215

10.7 Empirical Processes 217

10.8 Empirical Likelihood 218

10.9 Exercises 221

References 223

11 Density Estimation 225

11.1 Histogram 226

11.2 Kernel and Bandwidth 228

11.3 Exercises 235

References 236

12 Beyond Linear Regression 237

12.1 Least Squares Regression 238

12.2 Rank Regression 239

12.3 Robust Regression 243

12.4 Isotonic Regression 249

12.5 Generalized Linear Models 252

12.6 Exercises 259

References 261

13 Curve Fitting Techniques 263

13.1 Kernel Estimators 265

13.2 Nearest Neighbor Methods 269

13.3 Variance Estimation 272

13.4 Splines 273

13.5 Summary 279

13.6 Exercises 279

References 282

14 Wavelets 285

14.1 Introduction to Wavelets 285

14.2 How Do the Wavelets Work? 288

14.3 Wavelet Shrinkage 295

14.4 Exercises 304

References 305

15 Bootstrap 307

15.1 Bootstrap Sampling 307

15.2 Nonparametric Bootstrap 309

15.3 Bias Correction for Nonparametric Intervals 315

15.4 The Jackknife 317

15.5 Bayesian Bootstrap 318

15.6 Permutation Tests 320

15.7 More on the Bootstrap 324

15.8 Exercises 325

References 327

16 EM Algorithm 329

16.1 Fisher's Example 331

16.2 Mixtures 333

16.3 EM and Order Statistics 338

16.4 MAP via EM 339

16.5 Infection Pattern Estimation 341

16.6 Exercises 342

References 343

17 Statistical Learning 345

17.1 Discriminant Analysis 346

17.2 Linear Classification Models 349

17.3 Nearest Neighbor Classification 353

17.4 Neural Networks 355

17.5 Binary Classification Trees 361

17.6 Exercises 368

References 369

18 Nonparametric Bayes 371

18.1 Dirichlet Processes 372

18.2 Bayesian Categorical Models 380

18.3 Infinitely Dimensional Problems 383

18.4 Exercises 387

References 389

A WinBUGS 392

A.1 Using WinBUGS 393

A.2 Builtin

Functions 396

B R Coding 400

B.1 Programming in R 400

B.2 Basics of R 402

B.3 R Commands 403

B.4 R for Statistics 405

R Index 411

Author Index 414

Subject Index 418

Preface

Danger lies not in what we don't know, but in what we think we know that just ain't so.

Mark Twain (1835-1910)

This textbook is a substantial revision of a previous textbook written in 2007 by Kvam and Vidakovic. The biggest difference in this version is the adoption of the R programming language as a supplementary learning tool for the purpose of teaching concepts, illustrating examples, and completing computational homework assignments. In the original book, the authors relied on Matlab.

There has been plenty of change in the world of nonparametric statistics since we finished the first edition of this book. While the statistics community had already adapted to a modern framework for data analysis that relies increasingly on nonparametric procedures (not to mention Bayesian alternatives to traditional inference), we sense more adapters in engineering, medical research, chemistry, biology, and especially the behavioral sciences with each passing year. However, the field of nonparametric statistics has also receded toward the periphery of the statistics curriculum in the wake of data science, which continues to encroach on graduate curriculums associated with statistics, causing more programs to replace traditional statistics courses with the trendier versions involving data structures.

There are quality monographs/texts dealing with nonparametric statistics, such as the encyclopedic book by Hollander and Wolfe, Nonparametric Statistical Methods, or the excellent book by Conover, Practical Nonparametric Statistics, which has served as a staple for a generation of professors tasked to teach a course in this subject. Before engaging in writing the first version of this textbook, we taught several iterations of a graduate course on nonparametric statistics at Georgia Tech. The audience consisted of MS and PhD students in Engineering Statistics, Electrical Engineering, Bioengineering, Management, Logistics, Applied Mathematics, and Physics. While comprising a nonhomogeneous group, all of the students had solid mathematical, programming, and statistical training needed to benefit from the course.

In our course, we relied on the third edition of Conover's book, which is mainly concerned with what most of us think of as traditional nonparametric statistics: proportions, ranks, categorical data, goodness of fit, and so on, with the understanding that the text would be supplemented by the instructor's handouts. We ended up supplying an increasing number of handouts every year, for units such as density and function estimation, wavelets, Bayesian approaches to nonparametric problems, EM algorithm, splines, machine learning, and other arguably modern nonparametric topics. Later on, we decided to merge the handouts and fill the gaps.

With this new edition, we adhere to the traditional form one expects in an academic textbook, but we aim to provide more informal discussion and commentary to balance with the regimen of lessons that help the student progress through a statistics methods course. Unlike newer books that focus on data science, we want to help the student learn more than just how to implement a statistical procedure. We want them to understand, to a higher degree, what they are doing (or what R is doing for them).

We hope the book provides all of the tools and motivation for a student to study methods of nonparametric statistics, but we also aim to keep a conversational tone in our writing. Reading math-infused textbooks can be challenging, but it need not be a drudgery. For that reason, we remind the reader of the bigger picture, including the historical and cultural aspects linked to the development and application of nonparametric procedures. We think it is important to acknowledge the fundamental contributions to the field of nonparametric statistics by not only our field's pioneers, such as Karl Pearson, Nathan Mantel, or Brad Efron, but also others in our vanguard, including François-Marie Arouet (Voltaire), Karl Popper, and Baron Von Munchausen.

Computing. The book is integrated with R, and for many procedures covered in this book, we feature subroutines and packages (free libraries of code) of R code. The choice of software was natural: engineers, scientists, and increasingly statisticians are communicating in the "R language." R is an open-source language for statistical computing and quickly emerging environment as the standard for research and development. R provides a wide variety of packages that allow to perform various kinds of analyses and powerful graphic components. For Bayesian calculation we previously relied on WinBUGS, a free software from Cambridge's Biostatistics Research Unit. Both R and WinBUGS are briefly covered in two appendices for readers less familiar with them. For R-programmers who want to see a variety of programming modules for nonparametric inference in the R language, we refer you to the R-series guide Nonparametric Statistical Methods Using R by Kloke and McKean.

Outline of Chapters. For a typical graduate student to cover the full breadth of this textbook, two semesters would be required. For a one-semester course, the instructor should necessarily cover Chapters 1-3 and 5-9 to start. Depending on the scope of the class, the last part of the course can include different chapter selections.

Chapters 2-4 contain important background material the student needs to understand to effectively learn and apply the methods taught in a nonparametric analysis course. Because the ranks of observations have special importance in a nonparametric analysis, Chapter 5 presents basic results for order statistics and includes statistical methods to create tolerance intervals.

Traditional topics in estimation and testing are presented in Chapters 7-10 and should receive emphasis even to students who are most curious about advanced topics such as density estimation (Chapter 11), curve fitting (Chapter 13), and wavelets (Chapter 14). These topics include a core of rank tests that are analogous to common parametric procedures (e.g. -tests, analysis of variance).

Basic methods of categorical data analysis are contained in Chapter 9. Although most students in the biological sciences are exposed to a wide variety of statistical methods for categorical data, engineering students and other students in the physical sciences typically receive less schooling in this quintessential branch of statistics. Topics include methods based on tabled data, chi-square tests, and the introduction of general linear models. Also included in the first part of the book is the topic of "goodness of fit" (Chapter 6), which refers to testing data not in terms of some unknown parameters, but the unknown distribution that generated it. In a way, goodness of fit represents an interface between distribution-free methods and traditional parametric methods of inference, and both analytical and graphical procedures are presented. Chapter 10 presents the nonparametric alternative to maximum likelihood estimation and likelihood ratio-based confidence intervals.

The term "regression" is familiar from your previous course that introduced you to statistical methods. Nonparametric regression provides an alternative method of analysis that requires fewer assumptions of the response variable. In Chapter 12, we use the regression platform to introduce other important topics that build on linear regression, including isotonic (constrained) regression, robust regression, and generalized linear models. In Chapter 13, we introduce more general curve fitting methods. Regression models based on wavelets (Chapter 14) are presented in a separate chapter.

In the latter part of the book, emphasis is placed on nonparametric procedures that are becoming more relevant to engineering researchers and practitioners. Beyond the conspicuous rank tests, this text includes many of the newest nonparametric tools available to experimenters for data analysis. Chapter 17 introduces fundamental topics of statistical learning as a basis for data mining and pattern recognition and includes discriminant analysis, nearest-neighbor classifiers, neural networks, and binary classification trees. Computational tools needed for nonparametric analysis include bootstrap resampling (Chapter 15) and the EM algorithm (Chapter 16). Bootstrap methods, in particular, have become indispensable for uncertainty analysis with large data sets and elaborate stochastic models.

The textbook also unabashedly includes a review of Bayesian statistics and an overview of nonparametric Bayesian estimation. If you are familiar with Bayesian methods, you might wonder what role they play in nonparametric statistics. Admittedly, the connection is not obvious, but in fact nonparametric Bayesian methods (Chapter 18) represent an important set of tools for complicated problems in statistical modeling and learning, where many of the models are nonparametric in nature.

The book is intended both as a reference text and a text for a graduate course. We hope the reader will find this book useful. All comments, suggestions, updates, and critiques will be appreciated.

April 2022 Paul Kvam

Department of Mathematics

University of Richmond

Brani Vidakovic

Department of Statistics

Texas A & M University

Seong-joon...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Nonparametric Statistics with Applications to Science and Engineering with R

Description

More details

Other editions

Additional editions

Persons

Content

Preface

System requirements