Rank-Based Methods for Shrinkage and Selection

Name: Rank-Based Methods for Shrinkage and Selection | With Application to Machine Learning
Brand: Wiley
Price: 113.99 EUR
Availability: OnlineOnly

With Application to Machine Learning

A. K. Md. Ehsanes Saleh Mohammad Arashi Resve A. Saleh Mina Norouzirad(Author)

Wiley (Publisher)

1st Edition

Published on 12. April 2022

480 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-62542-1 (ISBN)

€113.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

1 Introduction to Rank-based Regression 1

1.1 Introduction 1

1.2 Robustness of the Median 1

1.2.1 Mean vs. Median 1

1.2.2 Breakdown Point 4

1.2.3 Order and Rank Statistics 5

1.3 Simple Linear Regression 6

1.3.1 Least Squares Estimator (LSE) 6

1.3.2 Theil's Estimator 7

1.3.3 Belgium Telephone Data Set 7

1.3.4 Estimation and Standard Error Comparison 9

1.4 Outliers and their Detection 11

1.4.1 Outlier Detection 12

1.5 Motivation for Rank-based Methods 13

1.5.1 Effect of a Single Outlier 13

1.5.2 Using Rank for the Location Model 16

1.5.3 Using Rank for the Slope 19

1.6 The Rank Dispersion Function 20

1.6.1 Ranking and Scoring Details 23

1.6.2 Detailed Procedure for R-estimation 25

1.7 Shrinkage Estimation and Subset Selection 30

1.7.1 Multiple Linear Regression using Rank 30

1.7.2 Penalty Functions 32

1.7.3 Shrinkage Estimation 34

1.7.4 Subset Selection 36

1.7.5 Blended Approaches 39

1.8 Summary 39

1.9 Problems 41

2 Characteristics of Rank-based Penalty Estimators 47

2.1 Introduction 47

2.2 Motivation for Penalty Estimators 47

2.3 Multivariate Linear Regression 49

2.3.1 Multivariate Least Squares Estimation 49

2.3.2 Multivariate R-estimation 51

2.3.3 Multicollinearity 51

2.4 Ridge Regression 53

2.4.1 Ridge Applied to Least Squares Estimation 53

2.4.2 Ridge Applied to Rank Estimation 55

2.5 Example: Swiss Fertility Data Set 56

2.5.1 Estimation and Standard Errors 59

2.5.2 Parameter Variance using Bootstrap 60

2.5.3 Reducing Variance using Ridge 61

2.5.4 Ridge Traces 62

2.6 Selection of Ridge Parameter ¿¿¿¿2 65

2.6.1 Quadratic Risk 65

2.6.2 K-fold Cross-validation Scheme 68

2.7 LASSO and aLASSO 71

2.7.1 Subset Selection 71

2.7.2 Least Squares with LASSO 71

2.7.3 The Adaptive LASSO and its Geometric Interpretation 73

2.7.4 R-estimation with LASSO and aLASSO 77

2.7.5 Oracle Properties 78

2.8 Elastic Net (Enet) 82

2.8.1 Naive Enet 82

2.8.2 Standard Enet 83

2.8.3 Enet in Machine Learning 84

2.9 Example: Diabetes Data Set 85

2.9.1 Model Building with R-aEnet 85

2.9.2 MSE vs. MAE 88

2.9.3 Model Building with LS-Enet 91

2.10 Summary 94

2.11 Problems 95

3 Location and Simple Linear Models 101

3.1 Introduction 101

3.2 Location Estimators and Testing 104

3.2.1 Unrestricted R-estimator of ¿¿¿¿ 104

3.2.2 Restricted R-estimator of ¿¿¿¿ 107

3.3 Shrinkage R-estimators of Location 108

3.3.1 Overview of Shrinkage R-estimators of ¿¿¿¿ 108

3.3.2 Derivation of the Ridge-type R-estimator 113

3.3.3 Derivation of the LASSO-type R-estimator 114

3.3.4 General Shrinkage R-estimators of ¿¿¿¿ 114

3.4 Ridge-type R-estimator of ¿¿¿¿ 117

3.5 Preliminary Test R-estimator of ¿¿¿¿ 118

3.5.1 Optimum Level of Significance of PTRE 121

3.6 Saleh-type R-estimators 122

3.6.1 Hard-Threshold R-estimator of ¿¿¿¿ 122

3.6.2 Saleh-type R-estimator of ¿¿¿¿ 123

3.6.3 Positive-rule Saleh-type (LASSO-type) R-estimator of ¿¿¿¿ 125

3.6.4 Elastic Net-type R-estimator of ¿¿¿¿ 127

3.7 Comparative Study of the R-estimators of Location 129

3.8 Simple Linear Model 132

3.8.1 Restricted R-estimator of Slope 134

3.8.2 Shrinkage R-estimator of Slope 135

3.8.3 Ridge-type R-estimation of Slope 135

3.8.4 Hard-Threshold R-estimator of Slope 136

3.8.5 Saleh-type R-estimator of Slope 137

3.8.6 Positive-rule Saleh-type (LASSO-type) R-estimator of Slope 138

3.8.7 The Adaptive LASSO (aLASSO-type) R-estimator 138

3.8.8 nEnet-type R-estimator of Slope 139

3.8.9 Comparative Study of R-estimators of Slope 140

3.9 Summary 141

3.10 Problems 142

4 Analysis of Variance (ANOVA) 149

4.1 Introduction 149

4.2 Model, Estimation and Tests 149

4.3 Overview of Multiple Location Models 150

4.3.1 Example: Corn Fertilizers 151

4.3.2 One-way ANOVA 151

4.3.3 Effect of Variance on Shrinkage Estimators 153

4.3.4 Shrinkage Estimators for Multiple Location 156

4.4 Unrestricted R-estimator 158

4.5 Test of Significance 161

4.6 Restricted R-estimator 162

4.7 Shrinkage Estimators 163

4.7.1 Preliminary Test R-estimator 163

4.7.2 The Stein-Saleh-type R-estimator 164

4.7.3 The Positive-rule Stein-Saleh-type R-estimator 165

4.7.4 The Ridge-type R-estimator 167

4.8 Subset Selection Penalty R-estimators 169

4.8.1 Preliminary Test Subset Selector R-estimator 169

4.8.2 Saleh-type R-estimator 170

4.8.3 Positive-rule Saleh Subset Selector (PRSS) 171

4.8.4 The Adaptive LASSO (aLASSO) 173

4.8.5 Elastic-net-type R-estimator 177

4.9 Comparison of the R-estimators 178

4.9.1 Comparison of URE and RRE 179

4.9.2 Comparison of URE and Stein-Saleh-type R-estimators 179

4.9.3 Comparison of URE and Ridge-type R-estimators 179

4.9.4 Comparison of URE and PTSSRE 180

4.9.5 Comparison of LASSO-type and Ridge-type R-estimators 180

4.9.6 Comparison of URE, RRE and LASSO 181

4.9.7 Comparison of LASSO with PTRE 181

4.9.8 Comparison of LASSO with SSRE 182

4.9.9 Comparison of LASSO with PRSSRE 182

4.9.10 Comparison of nEnetRE with URE 183

4.9.11 Comparison of nEnetRE with RRE 183

4.9.12 Comparison of nEnetRE with HTRE 183

4.9.13 Comparison of nEnetRE with SSRE 184

4.9.14 Comparison of Ridge-type vs. nEnetRE 184

4.10 Summary 185

4.11 Problems 185

5 Seemingly Unrelated Simple Linear Models 191

5.1 Introduction 191

5.1.1 Problem Formulation 193

5.2 Signed and Signed Rank Estimators of Parameters 194

5.2.1 General Shrinkage R-estimator of ¿¿¿¿ 198

5.2.2 Ridge-type R-estimator of ¿¿¿¿ 199

5.2.3 Preliminary Test R-estimator of ¿¿¿¿ 201

5.3 Stein-Saleh-type R-estimator of ¿¿¿¿ 202

5.3.1 Positive-rule Stein-Saleh R-estimators of ¿¿¿¿ 202

5.4 Saleh-type R-estimator of ¿¿¿¿ 203

5.4.1 LASSO-type R-estimator of the ¿¿¿¿ 205

5.5 Elastic-net-type R-estimators 206

5.6 R-estimator of Intercept When Slope Has Sparse Subset 207

5.6.1 General Shrinkage R-estimator of Intercept 207

5.6.2 Ridge-type R-estimator of ¿¿¿¿ 209

5.6.3 Preliminary Test R-estimators of ¿¿¿¿ 209

5.7 Stein-Saleh-type R-estimator of ¿¿¿¿ 210

5.7.1 Positive-rule Stein-Saleh-type R-estimator of ¿¿¿¿ 211

5.7.2 LASSO-type R-estimator of ¿¿¿¿ 213

5.8 Summary 213

5.8.1 Problems 214

6 Multiple Linear Regression Models 215

6.1 Introduction 215

6.2 Multiple Linear Model and R-estimation 215

6.3 Model Sparsity and Detection 218

6.4 General Shrinkage R-estimator of ¿¿¿¿ 221

6.4.1 Preliminary Test R-estimator 222

6.4.2 Stein-Saleh-type R-estimator 224

6.4.3 Positive-rule Stein-Saleh-type R-estimator 225

6.5 Subset Selectors 226

6.5.1 Preliminary Test Subset Selector R-estimator 226

6.5.2 Stein-Saleh-type R-estimator 228

6.5.3 Positive-rule Stein-Saleh-type R-estimator (LASSO-type) 229

6.5.4 Ridge-type Subset Selector 231

6.5.5 Elastic Net-type R-estimator 231

6.6 Adaptive LASSO 232

6.6.1 Introduction 232

6.6.2 Asymptotics for LASSO-type R-estimator 233

6.6.3 Oracle Property of aLASSO 235

6.7 Summary 238

6.8 Problems 239

7 Partially Linear Multiple Regression Model 241

7.1 Introduction 241

7.2 Rank Estimation in the PLM 242

7.2.1 Penalty R-estimators 246

7.2.2 Preliminary Test and Stein-Saleh-type R-estimator 248

7.3 ADB and ADL2-risk 249

7.4 ADL2-risk Comparisons 253

7.5 Summary: L2-risk Efficiencies 260

7.6 Problems 262

8 Liu Regression Models 263

8.1 Introduction 263

8.2 Linear Unified (Liu) Estimator 263

8.2.1 Liu-type R-estimator 266

8.3 Shrinkage Liu-type R-estimators 268

8.4 Asymptotic Distributional Risk 269

8.5 Asymptotic Distributional Risk Comparisons 271

8.5.1 Comparison of SSLRE and PTLRE 272

8.5.2 Comparison of PRSLRE and PTLRE 274

8.5.3 Comparison of PRLRE and SSLRE 276

8.5.4 Comparison of Liu-Type Rank Estimators With Counterparts 277

8.6 Estimation of d 279

8.7 Diabetes Data Analysis 280

8.7.1 Penalty Estimators 281

8.7.2 Performance Analysis 284

8.8 Summary 288

8.9 Problems 288

9 Autoregressive Models 291

9.1 Introduction 291

9.2 R-estimation of ¿¿¿¿ for the AR(¿¿¿¿)-Model 292

9.3 LASSO, Ridge, Preliminary Test and Stein-Saleh-type R-estimators 294

9.4 Asymptotic Distributional L2-risk 296

9.5 Asymptotic Distributional L2-risk Analysis 299

9.5.1 Comparison of Unrestricted vs. Restricted R-estimators 300

9.5.2 Comparison of Unrestricted vs. Preliminary Test R-estimator 300

9.5.3 Comparison of Unrestricted vs. Stein-Saleh-type R-estimators 300

9.5.4 Comparison of the Preliminary Test vs. Stein-Saleh-type R-estimators 302

9.6 Summary 303

9.7 Problems 304

10 High-Dimensional Models 307

10.1 Introduction 307

10.2 Identifiability of ¿¿¿¿* and Projection 309

10.3 Parsimonious Model Selection 309

10.4 Some Notation and Separation 311

10.4.1 Special Matrices 311

10.4.2 Steps Towards Estimators 312

10.4.3 Post-selection Ridge Estimation of ¿¿¿¿* ¿¿¿¿1 and ¿¿¿¿* ¿¿¿¿2 312

10.4.4 Post-selection Ridge R-estimators for ¿¿¿¿* ¿¿¿¿1 and ¿¿¿¿* ¿¿¿¿2 313

10.5 Post-selection Shrinkage R-estimators 315

10.6 Asymptotic Properties of the Ridge R-estimators 316

10.7 Asymptotic Distributional L2-Risk Properties 321

10.8 Asymptotic Distributional Risk Efficiency 324

10.9 Summary 326

10.10 Problems 327

11 Rank-based Logistic Regression 329

11.1 Introduction 329

11.2 Data Science and Machine Learning 329

11.2.1 What is Robust Data Science? 329

11.2.2 What is Robust Machine Learning? 332

11.3 Logistic Regression 333

11.3.1 Log-likelihood Setup 334

11.3.2 Motivation for Rank-based Logistic Methods 338

11.3.3 Nonlinear Dispersion Function 341

11.4 Application to Machine Learning 342

11.4.1 Example: Motor Trend Cars 344

11.5 Penalized Logistic Regression 347

11.5.1 Log-likelihood Expressions 347

11.5.2 Rank-based Expressions 348

11.5.3 Support Vector Machines 349

11.5.4 Example: Circular Data 353

11.6 Example: Titanic Data Set 359

11.6.1 Exploratory Data Analysis 359

11.6.2 RLR vs. LLR vs. SVM 365

11.6.3 Shrinkage and Selection 367

11.7 Summary 370

11.8 Problems 371

12 Rank-based Neural Networks 377

12.1 Introduction 377

12.2 Set-up for Neural Networks 379

12.3 Implementing Neural Networks 381

12.3.1 Basic Computational Unit 382

12.3.2 Activation Functions 382

12.3.3 Four-layer Neural Network 384

12.4 Gradient Descent with Momentum 386

12.4.1 Gradient Descent 386

12.4.2 Momentum 388

12.5 Back Propagation Example 389

12.5.1 Forward Propagation 390

12.5.2 Back Propagation 392

12.5.3 Dispersion Function Gradients 394

12.5.4 RNN Algorithm 395

12.6 Accuracy Metrics 396

12.7 Example: Circular Data Set 400

12.8 Image Recognition: Cats vs. Dogs 405

12.8.1 Binary Image Classification 406

12.8.2 Image Preparation 406

12.8.3 Over-fitting and Under-fitting 409

12.8.4 Comparison of LNN vs. RNN 410

12.9 Image Recognition: MNIST Data Set 414

12.10 Summary 421

12.11 Problems 421

Bibliography 433

Author Index 443

Subject Index445

Preface

The objective of this book is to introduce the audience to the theory and application of robust statistical methodologies using rank-based methods. We present a number of new ideas and research directions in machine learning and statistical analysis that the reader can and should pursue in the future. We begin by noting that the well-known least squares and likelihood principles are traditional methods of estimation in machine learning and data science. One of the most widely read books is the Introduction to Statistical Learning (James et al., 2013) which describes these and other methods. However, it also properly identifies many of their shortcomings, especially in terms of robustness in the presence of outliers. Our book describes a number of novel ideas and concepts to resolve these problems, many of which are worthy of further investigation. Our goal is to motivate the interest of more researchers to pursue further activities in this field. We build on this motivation to carry out a rigorous mathematical analysis of rank-based penalty estimators.

From our point of view, outliers are present in almost all real-world data sets. They may be the result of human error, transmission error, measurement error or simply due to the nature of the data being collected. Whatever be the reason, we must first recognize that all data sets have some form of outliers and then build solutions based on this fact. Outliers may greatly affect the estimates and lead to poor prediction accuracy. As a result, operations such as data cleaning, outlier detection and robust regression methods are extremely important in building models that provide suitably accurate prediction capability. Here, we describe rank-based methods to address many such problems. Indeed, many researchers are now involved in these and other methods towards robust data science. Most of the methods and results presented in this book were derived from our implementations in R and Python which are languages used routinely by statisticians and by practitioners in machine learning and data science. Some of the problems at the end of each chapter involve the use of R. The reader will be well-served to follow the descriptions in the book while implementing the ideas wherever possible in R or Python. This is the best way to get the most out of this book.

Rank regression is based on the linear rank dispersion function described by Jaeckel (1972). The dispersion function replaces the least squares loss function to enable estimates based on the median rather than the mean. This book is intended to guide the reader in this direction starting with basic principles such as the importance of the median vs. the mean, comparisons of rank vs. least squares methods on simple linear problems, and the role of penalty functions in improving the accuracy of prediction. We present new practical methods of data cleaning, subset selection and shrinkage estimation in the context of rank-based methods. We then begin our theoretical journey starting with basic rank statistics for location and simple linear models, and then move on to multiple regression, ANOVA and problems in a high-dimensional setting. We conclude with new ideas not published elsewhere in the literature in the area of rank-based logistic regression and neural networks to address classification problems in machine learning and data science.

We believe that most practitioners today are still employing least squares and log-likelihood methods that are not robust in the presence of outliers. This is due to the long history of these estimation methods in statistics and their natural adoption in the machine learning community over the past two decades. However, the history of estimation theory actually changed its course radically many decades prior when Stein (1956) and James and Stein (1961) proved that the sample mean based on a sample from a p-dimensional multivariate normal distribution is inadmissible under a quadratic loss function for p = 3. This result gave birth to a class of shrinkage estimators in various forms and set-ups. Due to the immense impact of Stein's theory, scores of technical papers appeared in the literature covering many areas of application. Beginning in the 1970s, the pioneering work of Saleh and Sen (1978, 1983, 1984b, a, 1985a, a, b, c, d, e, 1986, 1987) expanded the scope of this class of shrinkage estimators using the "quasi-empirical Bayes" method to obtain robust (such as R-, L-, and M-estimation) Stein-type estimators. Details are provided in Saleh (2006).

Of particular interest here is the use of penalty estimators in the context of robust R-estimation. Next generation "shrinkage estimators" known as "ridge regression estimators" for the multiple linear regression model were developed by Hoerl and Kennard (1970) based on "Tikhonov's regularization" (Tikhonov, 1963). The ridge regression (RR) estimator is the result of minimizing the penalized least squares criterion using an L2-penalty function. Ridge regression laid the foundation of penalty estimation. Later, Tibshirani (1996) proposed the "least absolute shrinkage and selection operator" (LASSO) by minimizing the penalized least squares criterion using an L1-penalty function which went viral in the area of model selection.

Unlike the RR estimator, LASSO simultaneously selects and estimates variables. It is the reminiscent of "subset selection". The subset selection rule is extremely variable due to its inherent discreteness (Breiman, 1996; Fan and Li, 2001). It is also highly variable and often trapped into a locally optimal solution rather than the globally optimal solution. LASSO is a continuous process and stable; however, it is not suggested to be used in multicollinear situations. Zou and Hastie (2005) proposed a compromised penalty function which is a combination of L1 and L2 penalty giving rise to the "elastic net" estimator. It can select groups of correlated variables. It is metaphorically like a stretchable fishing net retaining all potentially big fish.

Although LASSO simultaneously estimates and selects variables, it does not possess "oracle properties" in general. To overcome this problem Fan and Li (2001) proposed the "smoothly clipped absolute deviation" (SCAD) penalty function. Following Fan and Li (2001), Zou (2006) modified LASSO using a weighted L1-penalty function. Zou (2006) called this estimator an adaptive LASSO (aLASSO). Later, Zhang (2010) suggested a minimax concave penalty (MCP) estimator. All results found in the above literature are based on penalized least squares criterion.

This book contains a thorough study of rank-based estimation with three basic penalty estimators, namely, ridge regression, LASSO and "elastic net". It also includes preliminary test and Stein-type R-estimators for completeness. Efforts are made to present a clear and balanced introduction of rank-based estimators with mathematical comparisons of the properties of various estimators considered. The book is directed towards graduate students, researchers of statistics, economics, bio-statistical biologists and for all applied statisticians, economists and computer scientists and data scientists, among others. The literature is very limited in the area of robust penalty and other shrinkage estimators in the context of rank-based estimation. Here, we provide both theoretical and practical aspects of the subject matter.

The book is spread over twelve chapters. Chapter 1 begins with an introductory examination of the median, outliers and robust rank-based methods, along with a brief look at penalty estimators. Chapter 2 continues with the characteristics of rank-based penalty estimators and demonstrates their enormous value in machine learning. Chapter 3 provides the preliminaries of rank-based theory and various aspects of it, along with a description of penalty estimators, which are then applied to location and simple linear models. Chapters 4 deals with ANOVA and Chapter 5 with seemingly unrelated simple linear models. Chapter 6 considers the multiple linear model and Chapter 7 expands on the "partially linear regression model" (PLM). The Liu regression estimator is discussed in Chapter 8. Chapter 9 introduces the AR(p) model. Chapter 10 covers selection and shrinkage of variables in high-dimensional data analysis. Chapter 11 deals with multivariate rank-based logistic regression models. Finally, Chapter 12 concludes with applications of rank-based neural networks.

To our knowledge, this is one of the first books to combine advanced statistical analysis with advanced machine learning. Each chapter is self-contained but those interested in machine learning may consider Chapters 1 and 2, and 11 and 12, while those interested in statistics may consider Chapters 3-10. A good mix of the two would be derived from Chapters 1-4 and 11 and 12. It is our hope that readers in both fields will find something of value, and that it will lead to many areas of future research.

The authors wish to thank the developers of Rfit (Kloke and McKean, 2012) and glmnet (Stanford University) which are extremely useful packages for R-estimation and penalized maximum likelihood estimation, respectively. We also thank Professor Brent Johnson (University of Rochester) for the rank-based LASSO and aLASSO code (Johnson and Peng, 2008) provided on his website.

Professor A.K. Md. E. Saleh is grateful to NSERC for supporting his research for more than four decades and is...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Rank-Based Methods for Shrinkage and Selection

Description

More details

Other editions

Additional editions

Persons

Content

Preface

System requirements