
Statistical Pattern Recognition
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Reviews / Votes
"In the end I must add that this book is so appealing that Ioften found myself lost in the reading, pausing the overview of themanuscript in order to look more into some presented subject, andnot being able to continue until I had finished seeing all aboutit." (Zentralblatt MATH, 1 December 2012)More details
Other editions
Additional editions



Persons
Dr Andrew Robert Webb, Senior Researcher, QinetiQ Ltd, Malvern, UK.
Dr Keith Derek Copsey, Senior Researcher, QinetiQ Ltd, Malvern, UK.
Content
Notation xxiii
1 Introduction to Statistical Pattern Recognition 1
1.1 Statistical Pattern Recognition 1
1.1.1 Introduction 1
1.1.2 The Basic Model 2
1.2 Stages in a Pattern Recognition Problem 4
1.3 Issues 6
1.4 Approaches to Statistical Pattern Recognition 7
1.5 Elementary Decision Theory 8
1.5.1 Bayes' Decision Rule for Minimum Error 8
1.5.2 Bayes' Decision Rule for Minimum Error - Reject Option 12
1.5.3 Bayes' Decision Rule for Minimum Risk 13
1.5.4 Bayes' Decision Rule for Minimum Risk - Reject Option 15
1.5.5 Neyman-Pearson Decision Rule 15
1.5.6 Minimax Criterion 18
1.5.7 Discussion 19
1.6 Discriminant Functions 20
1.6.1 Introduction 20
1.6.2 Linear Discriminant Functions 21
1.6.3 Piecewise Linear Discriminant Functions 23
1.6.4 Generalised Linear Discriminant Function 24
1.6.5 Summary 26
1.7 Multiple Regression 27
1.8 Outline of Book 29
1.9 Notes and References 29
Exercises 31
2 Density Estimation - Parametric 33
2.1 Introduction 33
2.2 Estimating the Parameters of the Distributions 34
2.2.1 Estimative Approach 34
2.2.2 Predictive Approach 35
2.3 The Gaussian Classifier 35
2.3.1 Specification 35
2.3.2 Derivation of the Gaussian Classifier Plug-In Estimates 37
2.3.3 Example Application Study 39
2.4 Dealing with Singularities in the Gaussian Classifier 40
2.4.1 Introduction 40
2.4.2 Näive Bayes 40
2.4.3 Projection onto a Subspace 41
2.4.4 Linear Discriminant Function 41
2.4.5 Regularised Discriminant Analysis 42
2.4.6 Example Application Study 44
2.4.7 Further Developments 45
2.4.8 Summary 46
2.5 Finite Mixture Models 46
2.5.1 Introduction 46
2.5.2 Mixture Models for Discrimination 48
2.5.3 Parameter Estimation for Normal Mixture Models 49
2.5.4 Normal Mixture Model Covariance Matrix Constraints 51
2.5.5 How Many Components? 52
2.5.6 Maximum Likelihood Estimation via EM 55
2.5.7 Example Application Study 60
2.5.8 Further Developments 62
2.5.9 Summary 63
2.6 Application Studies 63
2.7 Summary and Discussion 66
2.8 Recommendations 66
2.9 Notes and References 67
Exercises 67
3 Density Estimation - Bayesian 70
3.1 Introduction 70
3.1.1 Basics 72
3.1.2 Recursive Calculation 72
3.1.3 Proportionality 73
3.2 Analytic Solutions 73
3.2.1 Conjugate Priors 73
3.2.2 Estimating the Mean of a Normal Distribution with Known Variance 75
3.2.3 Estimating the Mean and the Covariance Matrix of a Multivariate Normal Distribution 79
3.2.4 Unknown Prior Class Probabilities 85
3.2.5 Summary 87
3.3 Bayesian Sampling Schemes 87
3.3.1 Introduction 87
3.3.2 Summarisation 87
3.3.3 Sampling Version of the Bayesian Classifier 89
3.3.4 Rejection Sampling 89
3.3.5 Ratio of Uniforms 90
3.3.6 Importance Sampling 92
3.4 Markov Chain Monte Carlo Methods 95
3.4.1 Introduction 95
3.4.2 The Gibbs Sampler 95
3.4.3 Metropolis-Hastings Algorithm 103
3.4.4 Data Augmentation 107
3.4.5 Reversible Jump Markov Chain Monte Carlo 108
3.4.6 Slice Sampling 109
3.4.7 MCMC Example - Estimation of Noisy Sinusoids 111
3.4.8 Summary 115
3.4.9 Notes and References 116
3.5 Bayesian Approaches to Discrimination 116
3.5.1 Labelled Training Data 116
3.5.2 Unlabelled Training Data 117
3.6 Sequential Monte Carlo Samplers 119
3.6.1 Introduction 119
3.6.2 Basic Methodology 121
3.6.3 Summary 125
3.7 Variational Bayes 126
3.7.1 Introduction 126
3.7.2 Description 126
3.7.3 Factorised Variational Approximation 129
3.7.4 Simple Example 131
3.7.5 Use of the Procedure for Model Selection 135
3.7.6 Further Developments and Applications 136
3.7.7 Summary 137
3.8 Approximate Bayesian Computation 137
3.8.1 Introduction 137
3.8.2 ABC Rejection Sampling 138
3.8.3 ABC MCMC Sampling 140
3.8.4 ABC Population Monte Carlo Sampling 141
3.8.5 Model Selection 142
3.8.6 Summary 143
3.9 Example Application Study 144
3.10 Application Studies 145
3.11 Summary and Discussion 146
3.12 Recommendations 147
3.13 Notes and References 147
Exercises 148
4 Density Estimation - Nonparametric 150
4.1 Introduction 150
4.1.1 Basic Properties of Density Estimators 150
4.2 k-Nearest-Neighbour Method 152
4.2.1 k-Nearest-Neighbour Classifier 152
4.2.2 Derivation 154
4.2.3 Choice of Distance Metric 157
4.2.4 Properties of the Nearest-Neighbour Rule 159
4.2.5 Linear Approximating and Eliminating Search Algorithm 159
4.2.6 Branch and Bound Search Algorithms: kd-Trees 163
4.2.7 Branch and Bound Search Algorithms: Ball-Trees 170
4.2.8 Editing Techniques 174
4.2.9 Example Application Study 177
4.2.10 Further Developments 178
4.2.11 Summary 179
4.3 Histogram Method 180
4.3.1 Data Adaptive Histograms 181
4.3.2 Independence Assumption (Näive Bayes) 181
4.3.3 Lancaster Models 182
4.3.4 Maximum Weight Dependence Trees 183
4.3.5 Bayesian Networks 186
4.3.6 Example Application Study - Näive Bayes Text Classification 190
4.3.7 Summary 193
4.4 Kernel Methods 194
4.4.1 Biasedness 197
4.4.2 Multivariate Extension 198
4.4.3 Choice of Smoothing Parameter 199
4.4.4 Choice of Kernel 201
4.4.5 Example Application Study 202
4.4.6 Further Developments 203
4.4.7 Summary 203
4.5 Expansion by Basis Functions 204
4.6 Copulas 207
4.6.1 Introduction 207
4.6.2 Mathematical Basis 207
4.6.3 Copula Functions 208
4.6.4 Estimating Copula Probability Density Functions 209
4.6.5 Simple Example 211
4.6.6 Summary 212
4.7 Application Studies 213
4.7.1 Comparative Studies 216
4.8 Summary and Discussion 216
4.9 Recommendations 217
4.10 Notes and References 217
Exercises 218
5 Linear Discriminant Analysis 221
5.1 Introduction 221
5.2 Two-Class Algorithms 222
5.2.1 General Ideas 222
5.2.2 Perceptron Criterion 223
5.2.3 Fisher's Criterion 227
5.2.4 Least Mean-Squared-Error Procedures 228
5.2.5 Further Developments 235
5.2.6 Summary 235
5.3 Multiclass Algorithms 236
5.3.1 General Ideas 236
5.3.2 Error-Correction Procedure 237
5.3.3 Fisher's Criterion - Linear Discriminant Analysis 238
5.3.4 Least Mean-Squared-Error Procedures 241
5.3.5 Regularisation 246
5.3.6 Example Application Study 246
5.3.7 Further Developments 247
5.3.8 Summary 248
5.4 Support Vector Machines 249
5.4.1 Introduction 249
5.4.2 Linearly Separable Two-Class Data 249
5.4.3 Linearly Nonseparable Two-Class Data 253
5.4.4 Multiclass SVMs 256
5.4.5 SVMs for Regression 257
5.4.6 Implementation 259
5.4.7 Example Application Study 262
5.4.8 Summary 263
5.5 Logistic Discrimination 263
5.5.1 Two-Class Case 263
5.5.2 Maximum Likelihood Estimation 264
5.5.3 Multiclass Logistic Discrimination 266
5.5.4 Example Application Study 267
5.5.5 Further Developments 267
5.5.6 Summary 268
5.6 Application Studies 268
5.7 Summary and Discussion 268
5.8 Recommendations 269
5.9 Notes and References 270
Exercises 270
6 Nonlinear Discriminant Analysis - Kernel and Projection Methods 274
6.1 Introduction 274
6.2 Radial Basis Functions 276
6.2.1 Introduction 276
6.2.2 Specifying the Model 278
6.2.3 Specifying the Functional Form 278
6.2.4 The Positions of the Centres 279
6.2.5 Smoothing Parameters 281
6.2.6 Calculation of the Weights 282
6.2.7 Model Order Selection 284
6.2.8 Simple RBF 285
6.2.9 Motivation 286
6.2.10 RBF Properties 288
6.2.11 Example Application Study 288
6.2.12 Further Developments 289
6.2.13 Summary 290
6.3 Nonlinear Support Vector Machines 291
6.3.1 Introduction 291
6.3.2 Binary Classification 291
6.3.3 Types of Kernel 292
6.3.4 Model Selection 293
6.3.5 Multiclass SVMs 294
6.3.6 Probability Estimates 294
6.3.7 Nonlinear Regression 296
6.3.8 Example Application Study 296
6.3.9 Further Developments 297
6.3.10 Summary 298
6.4 The Multilayer Perceptron 298
6.4.1 Introduction 298
6.4.2 Specifying the MLP Structure 299
6.4.3 Determining the MLP Weights 300
6.4.4 Modelling Capacity of the MLP 307
6.4.5 Logistic Classification 307
6.4.6 Example Application Study 310
6.4.7 Bayesian MLP Networks 311
6.4.8 Projection Pursuit 313
6.4.9 Summary 313
6.5 Application Studies 314
6.6 Summary and Discussion 316
6.7 Recommendations 317
6.8 Notes and References 318
Exercises 318
7 Rule and Decision Tree Induction 322
7.1 Introduction 322
7.2 Decision Trees 323
7.2.1 Introduction 323
7.2.2 Decision Tree Construction 326
7.2.3 Selection of the Splitting Rule 327
7.2.4 Terminating the Splitting Procedure 330
7.2.5 Assigning Class Labels to Terminal Nodes 332
7.2.6 Decision Tree Pruning - Worked Example 332
7.2.7 Decision Tree Construction Methods 337
7.2.8 Other Issues 339
7.2.9 Example Application Study 340
7.2.10 Further Developments 341
7.2.11 Summary 342
7.3 Rule Induction 342
7.3.1 Introduction 342
7.3.2 Generating Rules from a Decision Tree 345
7.3.3 Rule Induction Using a Sequential Covering Algorithm 345
7.3.4 Example Application Study 350
7.3.5 Further Developments 351
7.3.6 Summary 351
7.4 Multivariate Adaptive Regression Splines 351
7.4.1 Introduction 351
7.4.2 Recursive Partitioning Model 351
7.4.3 Example Application Study 355
7.4.4 Further Developments 355
7.4.5 Summary 356
7.5 Application Studies 356
7.6 Summary and Discussion 358
7.7 Recommendations 358
7.8 Notes and References 359
Exercises 359
8 Ensemble Methods 361
8.1 Introduction 361
8.2 Characterising a Classifier Combination Scheme 362
8.2.1 Feature Space 363
8.2.2 Level 366
8.2.3 Degree of Training 368
8.2.4 Form of Component Classifiers 368
8.2.5 Structure 369
8.2.6 Optimisation 369
8.3 Data Fusion 370
8.3.1 Architectures 370
8.3.2 Bayesian Approaches 371
8.3.3 Neyman-Pearson Formulation 373
8.3.4 Trainable Rules 374
8.3.5 Fixed Rules 375
8.4 Classifier Combination Methods 376
8.4.1 Product Rule 376
8.4.2 Sum Rule 377
8.4.3 Min, Max and Median Combiners 378
8.4.4 Majority Vote 379
8.4.5 Borda Count 379
8.4.6 Combiners Trained on Class Predictions 380
8.4.7 Stacked Generalisation 382
8.4.8 Mixture of Experts 382
8.4.9 Bagging 385
8.4.10 Boosting 387
8.4.11 Random Forests 389
8.4.12 Model Averaging 390
8.4.13 Summary of Methods 396
8.4.14 Example Application Study 398
8.4.15 Further Developments 399
8.5 Application Studies 399
8.6 Summary and Discussion 400
8.7 Recommendations 401
8.8 Notes and References 401
Exercises 402
9 Performance Assessment 404
9.1 Introduction 404
9.2 Performance Assessment 405
9.2.1 Performance Measures 405
9.2.2 Discriminability 406
9.2.3 Reliability 413
9.2.4 ROC Curves for Performance Assessment 415
9.2.5 Population and Sensor Drift 419
9.2.6 Example Application Study 421
9.2.7 Further Developments 422
9.2.8 Summary 423
9.3 Comparing Classifier Performance 424
9.3.1 Which Technique is Best? 424
9.3.2 Statistical Tests 425
9.3.3 Comparing Rules When Misclassification Costs are Uncertain 426
9.3.4 Example Application Study 428
9.3.5 Further Developments 429
9.3.6 Summary 429
9.4 Application Studies 429
9.5 Summary and Discussion 430
9.6 Recommendations 430
9.7 Notes and References 430
Exercises 431
10 Feature Selection and Extraction 433
10.1 Introduction 433
10.2 Feature Selection 435
10.2.1 Introduction 435
10.2.2 Characterisation of Feature Selection Approaches 439
10.2.3 Evaluation Measures 440
10.2.4 Search Algorithms for Feature Subset Selection 449
10.2.5 Complete Search - Branch and Bound 450
10.2.6 Sequential Search 454
10.2.7 Random Search 458
10.2.8 Markov Blanket 459
10.2.9 Stability of Feature Selection 460
10.2.10 Example Application Study 462
10.2.11 Further Developments 462
10.2.12 Summary 463
10.3 Linear Feature Extraction 463
10.3.1 Principal Components Analysis 464
10.3.2 Karhunen-Lo`eve Transformation 475
10.3.3 Example Application Study 481
10.3.4 Further Developments 482
10.3.5 Summary 483
10.4 Multidimensional Scaling 484
10.4.1 Classical Scaling 484
10.4.2 Metric MDS 486
10.4.3 Ordinal Scaling 487
10.4.4 Algorithms 490
10.4.5 MDS for Feature Extraction 491
10.4.6 Example Application Study 492
10.4.7 Further Developments 493
10.4.8 Summary 493
10.5 Application Studies 493
10.6 Summary and Discussion 495
10.7 Recommendations 495
10.8 Notes and References 496
Exercises 497
11 Clustering 501
11.1 Introduction 501
11.2 Hierarchical Methods 502
11.2.1 Single-Link Method 503
11.2.2 Complete-Link Method 506
11.2.3 Sum-of-Squares Method 507
11.2.4 General Agglomerative Algorithm 508
11.2.5 Properties of a Hierarchical Classification 508
11.2.6 Example Application Study 509
11.2.7 Summary 509
11.3 Quick Partitions 510
11.4 Mixture Models 511
11.4.1 Model Description 511
11.4.2 Example Application Study 512
11.5 Sum-of-Squares Methods 513
11.5.1 Clustering Criteria 514
11.5.2 Clustering Algorithms 515
11.5.3 Vector Quantisation 520
11.5.4 Example Application Study 530
11.5.5 Further Developments 530
11.5.6 Summary 531
11.6 Spectral Clustering 531
11.6.1 Elementary Graph Theory 531
11.6.2 Similarity Matrices 534
11.6.3 Application to Clustering 534
11.6.4 Spectral Clustering Algorithm 535
11.6.5 Forms of Graph Laplacian 535
11.6.6 Example Application Study 536
11.6.7 Further Developments 538
11.6.8 Summary 538
11.7 Cluster Validity 538
11.7.1 Introduction 538
11.7.2 Statistical Tests 539
11.7.3 Absence of Class Structure 540
11.7.4 Validity of Individual Clusters 541
11.7.5 Hierarchical Clustering 542
11.7.6 Validation of Individual Clusterings 542
11.7.7 Partitions 543
11.7.8 Relative Criteria 543
11.7.9 Choosing the Number of Clusters 545
11.8 Application Studies 546
11.9 Summary and Discussion 549
11.10 Recommendations 551
11.11 Notes and References 552
Exercises 553
12 Complex Networks 555
12.1 Introduction 555
12.1.1 Characteristics 557
12.1.2 Properties 557
12.1.3 Questions to Address 559
12.1.4 Descriptive Features 560
12.1.5 Outline 560
12.2 Mathematics of Networks 561
12.2.1 Graph Matrices 561
12.2.2 Connectivity 562
12.2.3 Distance Measures 562
12.2.4 Weighted Networks 563
12.2.5 Centrality Measures 563
12.2.6 Random Graphs 564
12.3 Community Detection 565
12.3.1 Clustering Methods 565
12.3.2 Girvan-Newman Algorithm 568
12.3.3 Modularity Approaches 570
12.3.4 Local Modularity 571
12.3.5 Clique Percolation 573
12.3.6 Example Application Study 574
12.3.7 Further Developments 575
12.3.8 Summary 575
12.4 Link Prediction 575
12.4.1 Approaches to Link Prediction 576
12.4.2 Example Application Study 578
12.4.3 Further Developments 578
12.5 Application Studies 579
12.6 Summary and Discussion 579
12.7 Recommendations 580
12.8 Notes and References 580
Exercises 580
13 Additional Topics 581
13.1 Model Selection 581
13.1.1 Separate Training and Test Sets 582
13.1.2 Cross-Validation 582
13.1.3 The Bayesian Viewpoint 583
13.1.4 Akaike's Information Criterion 583
13.1.5 Minimum Description Length 584
13.2 Missing Data 585
13.3 Outlier Detection and Robust Procedures 586
13.4 Mixed Continuous and Discrete Variables 587
13.5 Structural Risk Minimisation and the Vapnik-Chervonenkis Dimension 588
13.5.1 Bounds on the Expected Risk 588
13.5.2 The VC Dimension 589
References 591
Index 637
2
Density estimation – parametric
A discrimination rule may be constructed through explicit estimation of the class-conditional density functions and the use of Bayes’ rule. One approach is to assume a simple parametric model for the density functions and to estimate the parameters of the model using an available training set. The Gaussian classifier and its variants are introduced. The more powerful approach of mixture models is then presented.
2.1 Introduction
In Chapter 1 we considered the basic theory of pattern classification. All the information regarding the density functions was assumed known. In practice, this knowledge is often unavailable or only partially known. Therefore, the next question that we must address is the estimation of the density functions themselves. If we can assume some parametric form for the distribution, perhaps obtained from theoretical considerations, or an assessment of the problem domain, then the problem reduces to one of estimating a finite number of parameters. Often the parametric form is chosen for convenience. In this chapter, special consideration is given to the normal (also referred to as Gaussian) distribution which leads to algorithms for the Gaussian classifier. The second major focus is on mixture models, which provide more general modelling capabilities.
We described in Chapter 1 how the minimum error decision is based on the probability of class membership , which using Bayes’ theorem may be written
The probability density function is the same for all classes [in fact , where C is the number of classes]. Thus, assuming that the prior probabilities p(ωj) are known, then in order to make a decision we need to estimate the class-conditional densities , for all classes.
Estimation of the density is based on a sample of observations () from class ωj. In this chapter and the next we consider parametric approaches to density estimation. In the parametric approach, we assume that the class-conditional density for class ωj is of a known form but has an unknown parameter, or set of parameters, , and we write this as . The alternative nonparametric approach to density estimation that we consider in Chapter 4 does not assume a simple functional form for the density.
2.2 Estimating the parameters of the distributions
We now introduce two approaches to estimating the parameters of a parametric class-conditional density, namely the estimative approach and the predictive or Bayesian approach. The focus in this chapter is on the estimative approach, with the Bayesian approach covered in Chapter 3.
2.2.1 Estimative approach
In the estimative approach to parametric density estimation we use an estimate of the parameter in the parametric density. Thus we take
where is an estimate of the parameter based on the data sample . A different data sample would give rise to a different estimate , but the estimative approach does not take into account this sampling variability.
The techniques and classifiers considered in this chapter use maximum likelihood estimation procedures to obtain the parameter estimates. In maximum likelihood estimation we seek to find the parameters that maximise the likelihood function defined using the data sample , i.e. we seek such that
where
is the likelihood function (i.e. the probability density of the data measurements given the specific value for the distribution parameters).
If the data measurement vectors making up the data sample are independent, then the likelihood function (2.3) can be written as a product of the known class-conditional densities for ωj
The validity of the assumption of independence may depend on the manner in which the data sample was collected. If the data sample consists of sensor measurements (e.g. camera imagery) then noise correlations may occur between successive data vectors if the sampling rate is high, and therefore the data would not be independent. However, it is common to proceed with an independent likelihood assumption even if there are expected to be correlations between data vectors. This is primarily due to the difficulty of estimating such correlations.
Under the independence assumption the maximum likelihood estimation problem becomes one of optimising a known function [Equation (2.4)]. Typically, logarithms will be taken so that we seek to maximise the log-likelihood function
which due to the strictly increasing nature of the logarithm is equivalent to maximising the likelihood function.
For some class-conditional densities (e.g. normal, as we shall see in the next section) an analytic solution for the optimal parameters is available. If an analytic solution is not available we can use numerical techniques such as gradient ascent or the Nelder–Mead method to maximise the likelihood function (details of such algorithms are available in Press et al., 1992). An iterative optimisation scheme is used for optimising the parameters of mixture model class-conditional densities (Section 2.5).
2.2.2 Predictive approach
An alternative approach to parametric density estimation is the predictive or Bayesian approach, which is covered in Chapter 3. We write
(2.5)
where is the Bayesian posterior density function for based on a prior and the data . Thus, we admit that we do not know the true value of , and instead of taking a single estimate, we take a weighted sum of the densities , weighted by the distribution (Aitchison et al., 1977). This predictive approach is usually more complicated than the estimative approach (both in classifier design, and in application) and may be regarded as making allowance for the sampling variability of the estimate of .
2.3 The Gaussian classifier
2.3.1 Specification
Perhaps the most widely used classifier is that in which the class conditional densities are modelled using the normal (Gaussian) distribution.
Normal (Gaussian) distribution
The probability density function for a normal (Gaussian) distribution with mean μ and variance σ2 is
(2.6)
The probability density function for a multivariate normal (Gaussian) distribution with mean and covariance matrix (a symmetric positive semi-definite matrix) is
(2.7)
where d is the dimensionality of the data.
We model a data vector from class ωj as being drawn from a normal distribution with mean vector and covariance matrix . The class conditional density is then given by
Classification is achieved by assigning a pattern to the class for which the posterior class probability, , is the greatest, or equivalently is the greatest. Using Equations (2.1) and (2.2), and the normal modelling of the class conditional densities above, we have
Since is the same for all classes, the discriminant rule is assign to ωi if gi > gj, for all j ≠ i, where
Classifying a pattern on the basis of the values of , gives the normal-based quadratic discriminant function (McLachlan, 1992a).
In the estimative approach, the quantities and in the above are replaced by estimates based on a training set. The estimates are obtained using maximum likelihood estimation given a set of (assumed independent) data samples from each class. Suppose that we have a set of samples from class ωj. Then the maximum likelihood estimate of the mean for class ωj is
the sample mean vector, and the maximum likelihood estimate for the covariance matrix is
the (biased) sample covariance matrix (we show how these estimates can be derived in the next subsection). Since the maximum likelihood estimate of the covariance matrix is biased [, see the exercises at the end of the chapter] it is common practice to replace it with an unbiased estimate
Substituting the estimates of the means and the covariance matrices (termed the ‘plug-in estimates’) of each class into (2.8) gives the Gaussian classifier or quadratic discrimination rule: assign to ωi if gi > gj, for all j ≠ i, where
If the training data have been gathered by collecting measurements in the operational (deployment) environment, then a plug-in estimate for the prior probability, p(ωj), is nj∑ini, where nj is the number of patterns in class ωj. Other common choices are to use uniform prior probabilities p(ωj) = 1C, or expert specified prior probabilities.
We may apply the Gaussian classifier (quadratic discrimination rule) to classify data vectors (e.g. members of a separate test set, if available). However, problems will occur in the Gaussian classifier if any...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.