Model Identification and Data Analysis

Name: Model Identification and Data Analysis
Brand: Wiley
Price: 121.99 EUR
Availability: OnlineOnly

Sergio Bittanti(Author)

Wiley (Publisher)

1st Edition

Published on 20. March 2019

416 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-54631-3 (ISBN)

€121.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Introduction xi

Acknowledgments xv

1 Stationary Processes and Time Series 1

1.1 Introduction 1

1.2 The Prediction Problem 1

1.3 Random Variable 4

1.4 Random Vector 5

1.4.1 Covariance Coefficient 7

1.5 Stationary Process 9

1.6 White Process 11

1.7 MA Process 12

1.8 AR Process 16

1.8.1 Study of the AR(1) Process 16

1.9 Yule-Walker Equations 20

1.9.1 Yule-Walker Equations for the AR(1) Process 20

1.9.2 Yule-Walker Equations for the AR(2) and AR(n) Process 21

1.10 ARMA Process 23

1.11 Spectrum of a Stationary Process 24

1.11.1 Spectrum Properties 24

1.11.2 Spectral Diagram 25

1.11.3 Maximum Frequency in Discrete Time 25

1.11.4 White Noise Spectrum 25

1.11.5 Complex Spectrum 26

1.12 ARMA Model: Stability Test and Variance Computation 26

1.12.1 Ruzicka Stability Criterion 28

1.12.2 Variance of an ARMA Process 32

1.13 FundamentalTheorem of Spectral Analysis 35

1.14 Spectrum Drawing 38

1.15 Proof of the FundamentalTheorem of Spectral Analysis 43

1.16 Representations of a Stationary Process 45

2 Estimation of Process Characteristics 47

2.1 Introduction 47

2.2 General Properties of the Covariance Function 47

2.3 Covariance Function of ARMA Processes 49

2.4 Estimation of the Mean 50

2.5 Estimation of the Covariance Function 53

2.6 Estimation of the Spectrum 55

2.7 Whiteness Test 57

3 Prediction 61

3.1 Introduction 61

3.2 Fake Predictor 62

3.2.1 Practical Determination of the Fake Predictor 64

3.3 Spectral Factorization 66

3.4 Whitening Filter 70

3.5 Optimal Predictor from Data 71

3.6 Prediction of an ARMA Process 76

3.7 ARMAX Process 77

3.8 Prediction of an ARMAX Process 78

4 Model Identification 81

4.1 Introduction 81

4.2 Setting the Identification Problem 82

4.2.1 Learning from Maxwell 82

4.2.2 A General Identification Problem 84

4.3 Static Modeling 85

4.3.1 Learning from Gauss 85

4.3.2 Least Squares Made Simple 86

4.3.2.1 Trend Search 86

4.3.2.2 Seasonality Search 86

4.3.2.3 Linear Regression 87

4.3.3 Estimating the Expansion of the Universe 90

4.4 Dynamic Modeling 92

4.5 External RepresentationModels 92

4.5.1 Box and Jenkins Model 92

4.5.2 ARX and AR Models 93

4.5.3 ARMAX and ARMA Models 94

4.5.4 MultivariableModels 96

4.6 Internal RepresentationModels 96

4.7 The Model Identification Process 100

4.8 The Predictive Approach 101

4.9 Models in Predictive Form 102

4.9.1 Box and Jenkins Model 103

4.9.2 ARX and AR Models 103

4.9.3 ARMAX and ARMA Models 104

5 Identification of Input-Output Models 107

5.1 Introduction 107

5.2 Estimating AR and ARX Models: The Least Squares Method 107

5.3 Identifiability 110

5.3.1 The ¯R Matrix for the ARX(1, 1) Model 111

5.3.2 The ¯R Matrix for a General ARX Model 112

5.4 Estimating ARMA and ARMAX Models 115

5.4.1 Computing the Gradient and the Hessian from Data 117

5.5 Asymptotic Analysis 123

5.5.1 Data Generation SystemWithin the Class of Models 125

5.5.2 Data Generation System Outside the Class of Models 127

5.5.2.1 Simulation Trial 132

5.5.3 General Considerations on the Asymptotics of Predictive Identification 132

5.5.4 Estimating the Uncertainty in Parameter Estimation 132

5.5.4.1 Deduction of the Formula of the Estimation Covariance 134

5.6 Recursive Identification 138

5.6.1 Recursive Least Squares 138

5.6.2 Recursive Maximum Likelihood 143

5.6.3 Extended Least Squares 145

5.7 Robustness of IdentificationMethods 147

5.7.1 Prediction Error and Model Error 147

5.7.2 Frequency Domain Interpretation 148

5.7.3 Prefiltering 149

5.8 Parameter Tracking 149

6 Model Complexity Selection 155

6.1 Introduction 155

6.2 Cross-validation 157

6.3 FPE Criterion 157

6.3.1 FPE Concept 157

6.3.2 FPE Determination 158

6.4 AIC Criterion 160

6.4.1 AIC Versus FPE 161

6.5 MDL Criterion 161

6.5.1 MDL Versus AIC 162

6.6 Durbin-Levinson Algorithm 164

6.6.1 Yule-Walker Equations for Autoregressive Models of Orders 1 and 2 165

6.6.2 Durbin-Levinson Recursion: From AR(1) to AR(2) 166

6.6.3 Durbin-Levinson Recursion for Models of Any Order 169

6.6.4 Partial Covariance Function 171

7 Identification of State Space Models 173

7.1 Introduction 173

7.2 Hankel Matrix 175

7.3 Order Determination 176

7.4 Determination of Matrices G and H 177

7.5 Determination of Matrix F 178

7.6 Mid Summary: An Ideal Procedure 179

7.7 Order Determination with SVD 179

7.8 Reliable Identification of a State Space Model 181

8 Predictive Control 187

8.1 Introduction 187

8.2 Minimum Variance Control 188

8.2.1 Determination of the MV Control Law 190

8.2.2 Analysis of the MV Control System 192

8.2.2.1 Structure 193

8.2.2.2 Stability 193

8.3 Generalized Minimum Variance Control 196

8.3.1 Model Reference Control 198

8.3.2 Penalized Control Design 200

8.3.2.1 Choice A for Q(z) 201

8.3.2.2 Choice B for Q(z) 203

8.4 Model-Based Predictive Control 204

8.5 Data-Driven Control Synthesis 205

9 Kalman Filtering and Prediction 209

9.1 Introduction 209

9.2 Kalman Approach to Prediction and Filtering Problems 210

9.3 The Bayes Estimation Problem 212

9.3.1 Bayes Problem - Scalar Case 213

9.3.2 Bayes Problem - Vector Case 215

9.3.3 Recursive Bayes Formula - Scalar Case 215

9.3.4 Innovation 217

9.3.5 Recursive Bayes Formula - Vector Case 219

9.3.6 Geometric Interpretation of Bayes Estimation 220

9.3.6.1 Geometric Interpretation of the Bayes Batch Formula 220

9.3.6.2 Geometric Interpretation of the Recursive Bayes Formula 222

9.4 One-step-ahead Kalman Predictor 223

9.4.1 The Innovation in the State Prediction Problem 224

9.4.2 The State Prediction Error 224

9.4.3 Optimal One-Step-Ahead Prediction of the Output 225

9.4.4 Optimal One-Step-Ahead Prediction of the State 226

9.4.5 Riccati Equation 228

9.4.6 Initialization 231

9.4.7 One-step-ahead Optimal Predictor Summary 232

9.4.8 Generalizations 236

9.4.8.1 System 236

9.4.8.2 Predictor 236

9.5 Multistep Optimal Predictor 237

9.6 Optimal Filter 239

9.7 Steady-State Predictor 240

9.7.1 Gain Convergence 241

9.7.2 Convergence of the Riccati Equation Solution 244

9.7.2.1 Convergence Under Stability 244

9.7.2.2 ConvergenceWithout Stability 246

9.7.2.3 Observability 250

9.7.2.4 Reachability 251

9.7.2.5 General Convergence Result 256

9.8 Innovation Representation 265

9.9 Innovation Representation Versus Canonical Representation 266

9.10 K-Theory Versus K-W Theory 267

9.11 Extended Kalman Filter - EKF 271

9.12 The Robust Approach to Filtering 273

9.12.1 Norm of a Dynamic System 274

9.12.2 Robust Filtering 276

10 Parameter Identification in a Given Model 281

10.1 Introduction 281

10.2 Kalman Filter-Based Approaches 281

10.3 Two-Stage Method 284

10.3.1 First Stage - Data Generation and Compression 285

10.3.2 Second Stage - Compressed Data Fitting 287

11 Case Studies 291

11.1 Introduction 291

11.2 Kobe Earthquake Data Analysis 291

11.2.1 Modeling the Normal Seismic Activity Data 294

11.2.2 Model Validation 296

11.2.3 Analysis of the Transition Phase via Detection Techniques 299

11.2.4 Conclusions 300

11.3 Estimation of a Sinusoid in Noise 300

11.3.1 Frequency Estimation by Notch Filter Design 301

11.3.2 Frequency Estimation with EKF 305

Appendix A Linear Dynamical Systems 309

A.1 State Space and Input-Output Models 309

A.1.1 Characteristic Polynomial and Eigenvalues 309

A.1.2 Operator Representation 310

A.1.3 Transfer Function 310

A.1.4 Zeros, Poles, and Eigenvalues 310

A.1.5 Relative Degree 311

A.1.6 Equilibrium Point and System Gain 311

A.2 Lagrange Formula 312

A.3 Stability 312

A.4 Impulse Response 313

A.4.1 Impulse Response from a State Space Model 314

A.4.2 Impulse Response from an Input-Output Model 314

A.4.3 Quadratic Summability of the Impulse Response 315

A.5 Frequency Response 315

A.6 Multiplicity of State Space Models 316

A.6.1 Change of Basis 316

A.6.2 Redundancy in the System Order 317

A.7 Reachability and Observability 318

A.7.1 Reachability 318

A.7.2 Observability 320

A.7.3 PBH Test of Reachability and Observability 321

A.8 System Decomposition 323

A.8.1 Reachability and Observability Decompositions 323

A.8.2 Canonical Decomposition 324

A.9 Stabilizability and Detectability 328

Appendix B Matrices 331

B.1 Basics 331

B.2 Eigenvalues 335

B.3 Determinant and Inverse 337

B.4 Rank 340

B.5 Annihilating Polynomial 342

B.6 Algebraic and Geometric Multiplicity 345

B.7 Range and Null Space 345

B.8 Quadratic Forms 346

B.9 Derivative of a Scalar Function with Respect to a Vector 349

B.10 Matrix Diagonalization via Similarity 350

B.11 Matrix Diagonalization via Singular Value Decomposition 351

B.12 Matrix Norm and Condition Number 353

Appendix C Problems and Solutions 357

Bibliography 391

Index 397

1
Stationary Processes and Time Series

1.1 Introduction

Forecasting the evolution of a man-made system or a natural phenomenon is one of the most ancient problems of human kind. We develop here a prediction theory under the assumption that the variable under study can be considered as stationary process. The theory is easy to understand and simple to apply. Moreover, it lends itself to various generalizations, enabling to deal with nonstationary signals.

The organization is as follows. After an introduction to the prediction problem (Section 1.2), we concisely review the notions of random variable, random vector, and random (or stochastic) process in Sections 1.3-1.5, respectively. This leads to the definition of white process (Section 1.6), a key notion in the subsequent developments. The readers who are familiar with random concepts can skip Sections 1.3-1.5.

Then we introduce the moving average () process and the autoregressive () process (Sections 1.7 and 1.8). By combining them, we come to the family of autoregressive and moving average () processes (Section 1.10). This is the family of stationary processes we focus on in this volume.

For such processes, in Chapter 3, we develop a prediction theory, thanks to which we can easily work out the optimal forecast given the model.

In our presentation, we make use of elementary concepts of linear dynamical systems such as transfer functions, poles, and zeros; the readers who are not familiar with such topics are cordially invited to first study Appendix A.

1.2 The Prediction Problem

Consider a real variable depending on discrete time . The variable is observed over the interval . The problem is to predict the value that will take the subsequent sample .

Various prediction rules may be conceived, providing a guess for based on . A generic predictor is denoted with the symbol :

The question is how to choose function .

A possibility is to consider only a bunch of recent data, say , , , , and to construct the prediction as a linear combination of them with real coefficients , , ., :

The problem then becomes that of selecting the integer and the most appropriate values for parameter , , ., .

Suppose for a moment that and were selected. Then the prediction rule is fully specified and it can be applied to the past time points for which data are available to evaluate the prediction error:

Let's now consider this fundamental question: Which characteristics should the prediction error exhibit in order to conclude that we have constructed a "good predictor"? In principle, the best one can hope for is that the prediction error be null at any time point. However, in practice, this is Utopian. Hence, we have to investigate the properties that a non-null should exhibit in order to conclude that the prediction is fair.

For the sake of illustration, consider the case when has the time evolution shown in Figure 1.1a. As can be seen, the mean value of is nonzero. Correspondingly, the rule

would be better than the original one. Indeed, with the new rule of prediction, one can get rid of the systematic error.

Figure 1.1 Possible diagrams of the prediction error.

As a second option, consider the case when the prediction error is given by the diagram of Figure 1.1b. Then the mean value is zero. However, the sign of changes at each instant; precisely, for even and for odd. Hence, even in such a case, a better prediction rule than the initial one can be conceived. Indeed, one can formulate the new rule:

and

From these simple remarks, one can conclude that the best predictor should have the following property: besides a zero mean value, the prediction error should have no regularity, rather it should be fully unpredictable. In this way, the model captures the whole dynamic hidden in data and no useful information remains unveiled in the residual error, and no better predictor can be conceived. The intuitive concept of "unpredictable signal" has been formalized in the twentieth century, leading to the notion of white noise () or white process, a concept we precisely introduce later in this chapter. For the moment, it is important to bear in mind the following conclusion: A prediction rule is appropriate if the corresponding prediction error is a white process.

In this connection, we make the following interesting observation. Assume that is indeed a white noise, then

Rewrite this difference equation by means of the delay operator , namely the operator such that

Then

from which

with

By reinterpreting as the complex variable, this relationship becomes the expression of a dynamical system with transfer function (from to ) given by .

Summing up, finding a good predictor is equivalent to determining a model supplying the given sequence of data as the output of a dynamical system fed by white noise (Figure 1.2).

Figure 1.2 Interpreting a sequence of data as the output of a dynamic model fed by white noise.

This is why studying dynamical systems having a white noise at the input is a main preliminary step toward the study of prediction theory.

The road we follow toward this objective relies first on the definition of white noise, which we pursue in four stages: random variable random vector stochastic process white noise.

1.3 Random Variable

A random (or stochastic) variable is a real variable that depends upon the outcome of a random experiment. For example, the variable taking the value or depending on the result of the tossing of a coin is a random variable.

The outcome of the random experiment is denoted by ; hence, a random variable is a function of : .

For our purposes, a random variable is described by means of its mean value (or expected value) and its variance, which we will denote by and , respectively.

The mean value is the real number around which the values taken by the variable fluctuate. Note that, given two random variables, and with mean values and , the random variable

obtained as a linear combination of and via the real numbers and , has a mean value:

The variance captures the intensity of fluctuations around the mean value. To be precise, it is defined as

where denotes the mean value of . Obviously, being non-negative, the variance is a real non-negative number.

Often, the variance is denoted with symbols such as or . When one deals with various random variables, the variance of the th variable may be denoted as or .

The square root of the variance is called standard deviation, denoted by or . If the random variable has a Gaussian distribution, then the mean value and the variance define completely the probability distribution of the variable. In particular, if a random variable is Gaussian, the probability that it takes value in the interval and is about . So if is Gaussian with mean value 10 and variance 100, then, in cases, the values taken by range from to .

1.4 Random Vector

A random (or stochastic) vector is a vector whose elements are random variables. We focus for simplicity on the bi-dimensional case, namely, given two random variables and ,

is a random vector (of dimension 2). The mean value of a random vector is defined as the vector of real numbers constituted by the mean values of the elements of the vector. Thus,

where and are the mean values of and , respectively. The variance is a matrix given by

where

Here, besides variances and of the single random variables, the so-called "cross-variance" between and , , and "cross-variance" between and , , appear. Obviously, , so that is a symmetric matrix.

It is easy to verify that the variance matrix can also be written in the form

where ´ denotes transpose.

In general, for a vector of any dimension, the variance matrix is given by

where is the vector whose elements are the mean values of the random variables entering .

If is a vector with entries, is a matrix. In any case, is a symmetric matrix having the variances of the single variables composing vector along the diagonal and all cross-variances as off-diagonal terms.

A remarkable feature of a variance matrix is that it is a positive semi-definite matrix.

Remark 1.1 (Positive semi-definiteness)

The notions of positive semi-definite and positive definite matrix are explained in Appendix B. In a very concise way, given a real symmetric matrix , associate to it the scalar function defined as , where is an -dimensional real vector. For example, if

we take

Then

Hence, is quadratic in the entries of vector . Matrix is said to be

positive semi-definite if ,
positive definite if it is positive semi-definite and only for

We write and to denote a positive semi-definite...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Model Identification and Data Analysis

Description

More details

Other editions

Additional editions

Person

Content

1
Stationary Processes and Time Series

1.1 Introduction

1.2 The Prediction Problem

1.3 Random Variable

1.4 Random Vector

Remark 1.1 (Positive semi-definiteness)

System requirements