Statistical Learning for Big Dependent Data

Name: Statistical Learning for Big Dependent Data
Brand: Wiley
Price: 124.99 EUR
Availability: OnlineOnly

Daniel Peña Ruey S. Tsay(Author)

Wiley (Publisher)

1st Edition

Published on 16. March 2021

560 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-41741-5 (ISBN)

€124.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Master advanced topics in the analysis of large, dynamically dependent datasets with this insightful resource

Statistical Learning with Big Dependent Data delivers a comprehensive presentation of the statistical and machine learning methods useful for analyzing and forecasting large and dynamically dependent data sets. The book presents automatic procedures for modelling and forecasting large sets of time series data. Beginning with some visualization tools, the book discusses procedures and methods for finding outliers, clusters, and other types of heterogeneity in big dependent data. It then introduces various dimension reduction methods, including regularization and factor models such as regularized Lasso in the presence of dynamical dependence and dynamic factor models. The book also covers other forecasting procedures, including index models, partial least squares, boosting, and now-casting. It further presents machine-learning methods, including neural network, deep learning, classification and regression trees and random forests. Finally, procedures for modelling and forecasting spatio-temporal dependent data are also presented.

Throughout the book, the advantages and disadvantages of the methods discussed are given. The book uses real-world examples to demonstrate applications, including use of many R packages. Finally, an R package associated with the book is available to assist readers in reproducing the analyses of examples and to facilitate real applications.

Analysis of Big Dependent Data includes a wide variety of topics for modeling and understanding big dependent data, like:

* New ways to plot large sets of time series

* An automatic procedure to build univariate ARMA models for individual components of a large data set

* Powerful outlier detection procedures for large sets of related time series

* New methods for finding the number of clusters of time series and discrimination methods , including vector support machines, for time series

* Broad coverage of dynamic factor models including new representations and estimation methods for generalized dynamic factor models

* Discussion on the usefulness of lasso with time series and an evaluation of several machine learning procedure for forecasting large sets of time series

* Forecasting large sets of time series with exogenous variables, including discussions of index models, partial least squares, and boosting.

* Introduction of modern procedures for modeling and forecasting spatio-temporal data

Perfect for PhD students and researchers in business, economics, engineering, and science: Statistical Learning with Big Dependent Data also belongs to the bookshelves of practitioners in these fields who hope to improve their understanding of statistical and machine learning methods for analyzing and forecasting big dependent data.

More details

Other editions

Persons

Content

Preface xvii

1. Introduction To Big Dependent Data 1

1.1 Examples of Dependent Data 2

1.2 Stochastic Processes 9

1.2.1 Scalar Processes 9

1.2.1.1 Stationarity 10

1.2.1.2 White Noise Process 12

1.2.1.3 Conditional Distribution 12

1.2.2 Vector Processes 12

1.2.2.1 Vector White Noises 15

1.2.2.2 Invertibility 15

1.3 Sample Moments of Stationary Vector Process 15

1.3.1 Sample Mean 16

1.3.2 Sample Covariance and Correlation Matrices 17

1.4 Nonstationary Processes 21

1.5 Principal Component Analysis 23

1.5.1 Discussion 26

1.5.2 Properties of the PCs 27

1.6 Effects of Serial Dependence 31

Appendix 1.A: Some Matrix Theory 34

Exercises 35

References 36

2. Linear Univariate Time Series 37

2.1 Visualizing a Large Set of Time Series 39

2.1.1 Dynamic Plots 39

2.1.2 Static Plots 44

2.2 Stationary ARMA Models 49

2.2.1 The Autoregressive Process 50

2.2.1.1 Autocorrelation Functions 51

2.2.2 The Moving Average Process 52

2.2.3 The ARMA Process 54

2.2.4 Linear Combinations of ARMA Processes 55

2.3 Spectral Analysis of Stationary Processes 58

2.3.1 Fitting Harmonic Functions to a Time Series 58

2.3.2 The Periodogram 59

2.3.3 The Spectral Density Function and Its Estimation 61

2.4 Integrated Processes 64

2.4.1 The Random Walk Process 64

2.4.2 ARIMA Models 65

2.4.3 Seasonal ARIMA Models 67

2.4.3.1 The Airline Model 69

2.5 Structural and State Space Models 71

2.5.1 Structural Time Series Models 71

2.5.2 State-Space Models 72

2.5.3 The Kalman Filter 76

2.6 Forecasting with Linear Models 78

2.6.1 Computing Optimal Predictors 78

2.6.2 Variances of the Predictions 80

2.6.3 Measuring Predictability 81

2.7 Modeling a Set of Time Series 82

2.7.1 Data Transformation 83

2.7.2 Testing forWhite Noise 85

2.7.3 Determination of the Difference Order 85

2.7.4 Model Identification 87

2.8 Estimation and Information Criteria 87

2.8.1 Conditional Likelihood 87

2.8.2 On-line Estimation 88

2.8.3 Maximum Likelihood (ML) Estimation 90

2.8.4 Model Selection 91

2.8.4.1 The Akaike Information Criterion (AIC) 91

2.8.4.2 The Bayesian Information Criterion (BIC) 92

2.8.4.3 Other Criteria 92

2.8.4.4 Cross-Validation 93

2.9 Diagnostic Checking 95

2.9.1 Residual Plot 96

2.9.2 Portmanteau Test for Residual Serial Correlations 96

2.9.3 Homoscedastic Tests 97

2.9.4 Normality Tests 98

2.9.5 Checking for Deterministic Components 98

2.10 Forecasting 100

2.10.1 Out-of-Sample Forecasts 100

2.10.2 Forecasting with Model Averaging 100

2.10.3 Forecasting with Shrinkage Estimators 102

Appendix 2.A: Difference Equations 103

Exercises 108

References 108

3. Analysis of Multivariate Time Series 111

3.1 Transfer Function Models 112

3.1.1 Single Input and Single Output 112

3.1.2 Multiple Inputs and Multiple Outputs 118

3.2 Vector AR Models 118

3.2.1 Impulse Response Function 120

3.2.2 Some Special Cases 121

3.2.3 Estimation 122

3.2.4 Model Building 123

3.2.5 Prediction 125

3.2.6 Forecast Error Variance Decomposition 127

3.3 Vector Moving-Average Models 135

3.3.1 Properties of VMA Models 136

3.3.2 VMA Modeling 136

3.4 Stationary VARMA Models 140

3.4.1 Are VAR Models Sufficient? 140

3.4.2 Properties of VARMA Models 141

3.4.3 Modeling VARMA Process 141

3.4.4 Use of VARMA Models 142

3.5 Unit Roots and Co-Integration 147

3.5.1 Spurious Regression 148

3.5.2 Linear Combinations of a Vector Process 148

3.5.3 Co-integration 149

3.5.4 Over-Differencing 150

3.6 Error-Correction Models 151

3.6.1 Co-integration Test 152

Exercises 157

References 157

4. Handling Heterogeneity In Many Time Series 161

4.1 Intervention Analysis 162

4.1.1 Intervention with Indicator Variables 163

4.1.2 Intervention with Step Functions 165

4.1.3 Intervention with General Exogenous Variables 166

4.1.4 Building an Intervention Model 166

4.2 Estimation of Missing Values 170

4.2.1 Univariate Interpolation 170

4.2.2 Multivariate Interpolation 172

4.3 Outliers in Vector Time Series 174

4.3.1 Multivariate Additive Outliers 175

4.3.1.1 Effects on Residuals and Estimation 176

4.3.2 Multivariate Level Shift or Structural Break 177

4.3.2.1 Effects on Residuals and Estimation 177

4.3.3 Other Types of Outliers 178

4.3.3.1 Multivariate Innovative Outliers 178

4.3.3.2 Transitory Change 179

4.3.3.3 Ramp Shift 179

4.3.4 Masking and Swamping 180

4.4 Univariate Outlier Detection 180

4.4.1 Other Procedures for Univariate Outlier Detection 183

4.4.2 New Approaches to Outlier Detection 184

4.5 Multivariate Outliers Detection 189

4.5.1 VARMA Outlier Detection 189

4.5.2 Outlier Detection by Projections 190

4.5.3 A Projection Algorithm for Outliers Detection 192

4.5.4 The Nonstationary Case 193

4.6 Robust Estimation 196

4.7 Heterogeneity for Parameter Changes 199

4.7.1 Parameter Changes in Univariate Time Series 199

4.7.2 Covariance Changes in Multivariate Time Series 200

4.7.2.1 Detecting Multiple Covariance Changes 202

4.7.2.2 LR Test 202

Appendix 4.A: Cusum Algorithms 204

4.A.1 Detecting Univariate LS 204

4.A.2 Detecting Multivariate Level Shift 204

4.A.3 Detecting Multiple Covariance Changes 206

Exercises 206

References 207

5. Clustering and Classification of Time Series 211

5.1 Distances and Dissimilarities 212

5.1.1 Distance Between Univariate Time Series 212

5.1.2 Dissimilarities Between Univariate Series 215

5.1.3 Dissimilarities Based on Cross-Linear Dependency 222

5.2 Hierarchical Clustering of Time Series 228

5.2.1 Criteria for Defining Distances Between Groups 228

5.2.2 The Dendrogram 229

5.2.3 Selecting the Number of Groups 229

5.2.3.1 The Height and Step Plots 229

5.2.3.2 Silhouette Statistic 230

5.2.3.3 The Gap Statistic 233

5.3 Clustering by Variables 243

5.3.1 The k-means Algorithm 244

5.3.1.1 Number of Groups 246

5.3.2 k-Medoids 250

5.3.3 Model-Based Clustering by Variables 252

5.3.3.1 Maximum Likelihood (ML) Estimation of the AR Mixture Model 253

5.3.3.2 The EM Algorithm 254

5.3.3.3 Estimation of Mixture of Multivariate Normals 256

5.3.3.4 Bayesian Estimation 257

5.3.3.5 Clustering with Structural Breaks 258

5.3.4 Clustering by Projections 259

5.4 Classification with Time Series 264

5.4.1 Classification Among a Set of Models 264

5.4.2 Checking the Classification Rule 267

5.5 Classification with Features 267

5.5.1 Linear Discriminant Function 268

5.5.2 Quadratic Classification and Admissible Functions 269

5.5.3 Logistic Regression 270

5.6 Nonparametric Classification 277

5.6.1 Nearest Neighbors 277

5.6.2 Support Vector Machines 278

5.6.2.1 Linearly Separable Problems 279

5.6.2.2 Nonlinearly Separable Problems 282

5.6.3 Density Estimation 284

5.7 Other Classification Problems and Methods 286

Exercises 287

References 288

6. Dynamic Factor Models 291

6.1 The DFM for Stationary Series 293

6.1.1 Properties of the Covariance Matrices 295

6.1.1.1 The Exact DFM 295

6.1.1.2 The Approximate DFM 297

6.1.2 Dynamic Factor and VARMA Models 299

6.2 Fitting a Stationary DFM to Data 301

6.2.1 Principal Components (PC) Estimation 301

6.2.2 Pooled PC Estimator 303

6.2.3 Generalized PC Estimator 303

6.2.4 ML Estimation 304

6.2.5 Selecting the Number of Factors 305

6.2.5.1 Rank Testing via Canonical Correlation 306

6.2.5.2 Testing a Jump in Eigenvalues 307

6.2.5.3 Using Information Criteria 307

6.2.6 Forecasting with DFM 308

6.2.7 Alternative Formulations of the DFM 314

6.3 Generalized DFM (GDFM) for Stationary Series 315

6.3.1 Some Properties of the GDFM 316

6.3.2 GDFM and VARMA Models 317

6.4 Dynamic Principal Components 317

6.4.1 Dynamic Principal Components for Optimal Reconstruction 317

6.4.2 One-Sided DPCs 318

6.4.3 Model Selection and Forecasting 320

6.4.4 One Sided DPC and GDFM Estimation 321

6.5 DFM for Nonstationary Series 324

6.5.1 Cointegration and DFM 329

6.6 GDFM for Nonstationary Series 330

6.6.1 Estimation by Generalized DPC 330

6.7 Outliers in DFMs 333

6.7.1 Factor and Idiosyncratic Outliers 333

6.7.2 A Procedure to Find Outliers in DFM 335

6.8 DFM with Cluster Structure 336

6.8.1 Fitting DFMCS 337

6.9 Some Extensions of DFM 344

6.10 High-Dimensional Case 345

6.10.1 Sparse PCs 345

6.10.2 A Structural-FM Approach 347

6.10.3 Estimation 348

6.10.4 Selecting the Number of Common Factors 349

6.10.5 Asymptotic Properties of Loading Estimates 351

Appendix 6.A: Some R Commands 352

Exercises 353

References 354

7. Forecasting With Big Dependent Data 359

7.1 Regularized Linear Models 360

7.1.1 Properties of Lasso Estimator 362

7.1.2 Some Extensions of Lasso Regression 366

7.1.2.1 Adaptive Lasso 367

7.1.2.2 Group Lasso 367

7.1.2.3 Elastic Net 368

7.1.2.4 Fused Lasso 368

7.1.2.5 SCAD Penalty 368

7.2 Impacts of Dynamic Dependence on Lasso 377

7.3 Lasso for Dependent Data 383

7.4 Principal Component Regression and Diffusion Index 388

7.5 Partial Least Squares 392

7.6 Boosting 397

7.6.1 ¿¿¿¿₂ Boosting 399

7.6.2 Choices of Weak Learner 399

7.6.3 Boosting for Classification 403

7.7 Mixed-Frequency Data and Nowcasting 404

7.7.1 Midas Regression 405

7.7.2 Nowcasting 406

7.8 Strong Serial Dependence 413

Exercises 414

References 414

8. Machine Learning of Big Dependent Data 419

8.1 Regression Trees and Random Forests 420

8.1.1 Growing Tree 420

8.1.2 Pruning 422

8.1.3 Classification Trees 422

8.1.4 Random Forests 424

8.2 Neural Networks 427

8.2.1 Network Training 429

8.3 Deep Learning 436

8.3.1 Types of Deep Networks 436

8.3.2 Recurrent NN 437

8.3.3 Activation Functions for Deep Learning 439

8.3.4 Training Deep Networks 440

8.3.4.1 Long Short-Term Memory Model 440

8.3.4.2 Training Algorithm 441

8.4 Some Applications 442

8.4.1 The Package: keras 442

8.4.2 Dropout Layer 449

8.4.3 Application of Convolution Networks 450

8.4.4 Application of LSTM 457

8.5 Deep Generative Models 466

8.6 Reinforcement Learning 466

Exercises 467

References 468

9. Spatio-Temporal Dependent Data 471

9.1 Examples and Visualization of Spatio Temporal Data 472

9.2 Spatial Processes and Data Analysis 477

9.3 Geostatistical Processes 479

9.3.1 Stationary Variogram 480

9.3.2 Examples of Semivariogram 480

9.3.3 Stationary Covariance Function 482

9.3.4 Estimation of Variogram 483

9.3.5 Testing Spatial Dependence 483

9.3.6 Kriging 484

9.3.6.1 Simple Kriging 484

9.3.6.2 Ordinary Kriging 486

9.3.6.3 Universal Kriging 487

9.4 Lattice Processes 488

9.4.1 Markov-Type Models 488

9.5 Spatial Point Processes 491

9.5.1 Second-Order Intensity 492

9.6 S-T Processes and Analysis 495

9.6.1 Basic Properties 496

9.6.2 Some Nonseparable Covariance Functions 498

9.6.3 S-T Variogram 499

9.6.4 S-T Kriging 500

9.7 Descriptive S-T Models 504

9.7.1 Random Effects with S-T Basis Functions 505

9.7.2 Random Effects with Spatial Basis Functions 506

9.7.3 Fixed Rank Kriging 507

9.7.4 Spatial Principal Component Analysis 510

9.7.5 Random Effects with Temporal Basis Functions 514

9.8 Dynamic S-T Models 519

9.8.1 Space-Time Autoregressive Moving-Average Models 520

9.8.2 S-T Component Models 521

9.8.3 S-T Factor Models 521

9.8.4 S-T HMs 522

Appendix 9.A: Some R Packages and Commands 523

Exercises 525

References 525

Index 529

PREFACE

For the first time in human history, we are collecting data everywhere and every second. These data grow exponentially, are produced and stored at minimum costs, and are changing the way we learn things and control our activities, including monitoring our health and using our leisure time. Statistics, as a scientific discipline, was created in a different environment, where data were scarce, and focused mainly on obtaining maximum efficiency of available information from small-structured data sets. New methods are, therefore, needed to extract useful information from big data sets, which are heterogeneous and unstructured. These methods are being developed in statistics, computer science, machine learning, operation research, artificial intelligence, and other fields. They constitute what is usually called data science. Most advances in big data analysis so far have assumed that the data are collected from independent subjects, so that they can be treated as independent observations. On the other hand, empirical data are often generated over time or in space, and, hence, they have dynamic and/or spatial dependence.

The main goal of this book is to learn from big dependent data. Data with temporal dynamics have been studied in statistics as time series analysis, whereas spatial dependence belongs to the newer area of spatial statistics. Several key ideas that formed the core of recent developments in big data methods are from time series analysis. For instance, the first model selection criterion was proposed by Akaike for selecting the order of an autoregressive process, an iterative model building procedure with nonlinear estimation was proposed by Box and Jenkins for autoregressive integrated moving-average (ARIMA) models, and methods for combining forecasts, which open the way to model averaging and ensemble methods in machine learning, can be traced back to Bayesian forecasting. Today, analyzing big data with temporal and spatial dependence is needed in many scientific fields, ranging from economics and business to environmental and health science, and to engineering and computer vision. It is our belief that modeling and processing big dependent data is a key emerging area of data science.

Several excellent books have been written for statistical learning with independent data, but much less work has been done for dependent data. This book tries to stimulate the use of available methods in forecasting and classification, to promote further research in statistical methods for extracting useful information from big dependent data, and to point out the potential weaknesses when the data dependence is overlooked. We start with brief reviews of time series analysis, both univariate and multivariate, followed by methods for handling heterogeneity when analyzing many time series. We then discuss methods for classification and clustering many time series and introduce dynamic factor models for modeling multivariate or high-dimensional time series. This is followed by a thorough discussion on forecasting in a data-rich environment, including nowcasting. We then turn to deep learning and analysis of spatio-temporal data.

The book is applied oriented, but it also provides some basic theory to help readers understand better the methods and procedures used. Due to the high-dimensional nature of the problems discussed, we include some technical derivations in a few sections, but for most parts, we refer readers interested in rigorous mathematical proofs to the available literature. Real examples are used throughout the book to demonstrate the analysis and applications. For empirical data analysis, we use R extensively and provide the necessary instructions and R scripts so that readers can reproduce the results shown in the book.

The book is organized as follows. Chapter 1 provides some examples of the data sets considered in the book and introduces the basic ideas of temporal dependency, stochastic process, and time series. It also discusses principal component analysis and illustrates the weaknesses of traditional statistical inference when the data dependence is overlooked. Chapter 2 starts with new ways to visualize large sets of time series, summarizes the main properties of ARIMA and state space models, including spectral analysis and Kalman filter, and presents a methodology for building univariate models for large sets of time series. Chapter 3 reviews the available methods for building multivariate vector ARMA models and their limitations with high-dimensional time series. It also discusses cointegration and ways to handle unit-root multivariate nonstationary series. Real time series data are typically heterogeneous, with clustering and certain common features, contain missing values and outliers, and may encounter structural breaks. We address these issues in the next two chapters. Chapter 4 presents missing value estimation and some new methods for detecting and modeling outliers in univariate and multivariate time series. The outliers considered include additive outliers, innovative outliers, level shifts, transitory shifts, and parameter changes. Chapter 5 analyzes clustering, or unsupervised classification methods, for time series, including recent procedures for clustering time series based on their dependency or for selecting the number of clusters. The chapter also considers time series discrimination, or supervised classification, and covers approaches developed not only in statistics, but also in machine learning such as support vector machines. Chapter 6 focuses on an active research topic for modeling large sets of time series, namely dynamic factor models and other types of factor models. The chapter describes in detail the developments of the topic and discusses the pros and cons of several models available in the literature, including those for the high-dimensional setting. Chapter 7 concentrates on the key problem of forecasting large sets of time series. The application of Lasso regression to time series is investigated, as well as procedures developed mostly in the statistic and econometric literature for forecasting high-dimensional time series. It also studies the method of nowcasting and demonstrates its usefulness with real examples. Chapter 8 considers machine learning for forecasting and classification, including neural networks and deep learning. It also studies methods developed in the interface between Statistics and Machine Learning, commonly known as Data Science, including Classification and Regression Trees (CART) and Random Forests. Finally, Chapter 9 studies spatio-temporal data and their applications, including various kriging methods for spatial predictions.

Dependent data cover a wide range of topics and methods, especially in the presence of high-dimensional data. It is too much to expect that a book can cover all these important topics. This book is no exception. We need to make decision on the material to include. Like other books, our choices depend heavily on our experience, research areas, and preference. There are some important subjects, especially those under rapid development, that we do not cover, for instance, functional data analysis for dependent data. We hope to include this and others relevant topics in a future edition.

The book takes advantages of many available R packages. Yet, there remain some methods discussed in the book that cannot be implemented with any existing package. We have developed some R scripts to perform such analyses. For instance, we have developed a new automatic modeling procedure for a large set of time series, which is used and demonstrated in Chapter 2. We have included, with the great help of Angela Caro and Antonio Elías, these R scripts into a new package for the book and refer to it as the SLBDD package. In addition, the data sets used, except for those subject to copyright protection are also included. Some R scripts are provided in the web page of the book at https://www.rueytsay.com/slbdd.

We are grateful to many people who have contributed to our research in general and to this book in particular. We have dedicated the book to Professor George C. Tiao, a giant in time series analysis and a pioneer of many procedures presented here. He is a generous mentor and a good friend of us. We have also learned a lot from our coauthors on topics covered in the book. In particular, we like to acknowledge the contributions from Andrés Alonso, Stevenson Bolívar, Jorge Caiado, Angeles Carnero, Rong Chen, Nuno Crato, Chaoxing Dai, Soudeep Deb, Pedro Galeano, Zhaoxing Gao, Victor Guerrero, Yuefeng Han, Nan-Jung Hsu, Hsin-Cheng Huang, Ching-Kang Ing, Ana Justel, Mladen Kolar, Tengyuan Liang, Agustin Maravall, Fabio Nieto, Pilar Poncela, Javier Prieto, Veronika Rockova, Julio Rodríguez, Rosario Romera, Juan Romo, Esther Ruiz, Ismael Sánchez, Maria Jesús Sánchez, Ezequiel Smucler, Victor Yohai, and Rubén Zamar. Some of the programs used in our analyses were written by Angela Caro, Pedro Galeano, Carolina Gamboa, Yuefeng Han, and Javier Prieto. We sincerely thank them for their wonderful works. Chaoxing Dai helps us in many ways with the keras package for deep learning. Without his help, we would not be able to demonstrate its applications.

Finally, we have always had the constant support of our families. DP wants to thank his wife, Jette Bohsen, for her constant love and encouragement in all his projects, and, in particular, for her generous continued help during the years he was writing this book. RST likes to thank the love and care of his wife and the support from his children. Our families are the source of our energy and inspiration. Without their unlimited and...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Statistical Learning for Big Dependent Data

Description

More details

Other editions

Additional editions

Persons

Content

PREFACE

System requirements