AN INTRODUCTION TO MACHINE LEARNING THAT INCLUDES THE FUNDAMENTAL TECHNIQUES, METHODS, AND APPLICATIONS
PROSE Award Finalist 2019
Association of American Publishers Award for Professional and Scholarly Excellence
Machine Learning: a Concise Introduction offers a comprehensive introduction to the core concepts, approaches, and applications of machine learning. The author--an expert in the field--presents fundamental ideas, terminology, and techniques for solving applied problems in classification, regression, clustering, density estimation, and dimension reduction. The design principles behind the techniques are emphasized, including the bias-variance trade-off and its influence on the design of ensemble methods. Understanding these principles leads to more flexible and successful applications. Machine Learning: a Concise Introduction also includes methods for optimization, risk estimation, and model selection-- essential elements of most applied projects. This important resource:
* Illustrates many classification methods with a single, running example, highlighting similarities and differences between methods
* Presents R source code which shows how to apply and interpret many of the techniques covered
* Includes many thoughtful exercises as an integral part of the text, with an appendix of selected solutions
* Contains useful information for effectively communicating with clients
A volume in the popular Wiley Series in Probability and Statistics, Machine Learning: a Concise Introduction offers the practical information needed for an understanding of the methods and application of machine learning.
STEVEN W. KNOX holds a Ph.D. in Mathematics from the University of Illinois and an M.S. in Statistics from Carnegie Mellon University. He has over twenty years' experience in using Machine Learning, Statistics, and Mathematics to solve real-world problems. He currently serves as Technical Director of Mathematics Research and Senior Advocate for Data Science at the National Security Agency.
Reihe
Auflage
Sprache
Dateigröße
ISBN-13
978-1-119-43907-3 (9781119439073)
Schweitzer Klassifikation
1 - Machine Learning: a Concise Introduction [Seite 3]
2 - Contents [Seite 7]
3 - Preface [Seite 13]
4 - Organization-How to Use This Book [Seite 15]
5 - Acknowledgments [Seite 19]
6 - About the Companion Website [Seite 21]
7 - 1 Introduction-Examples from Real Life [Seite 23]
8 - 2 The Problem of Learning [Seite 25]
8.1 - 2.1 Domain [Seite 26]
8.2 - 2.2 Range [Seite 26]
8.3 - 2.3 Data [Seite 26]
8.4 - 2.4 Loss [Seite 28]
8.5 - 2.5 Risk [Seite 30]
8.6 - 2.6 The Reality of the Unknown Function [Seite 34]
8.7 - 2.7 Training and Selection of Models, and Purposes of Learning [Seite 34]
8.8 - 2.8 Notation [Seite 35]
9 - 3 Regression [Seite 37]
9.1 - 3.1 General Framework [Seite 38]
9.2 - 3.2 Loss [Seite 39]
9.3 - 3.3 Estimating the Model Parameters [Seite 39]
9.4 - 3.4 Properties of Fitted Values [Seite 41]
9.5 - 3.5 Estimating the Variance [Seite 44]
9.6 - 3.6 A Normality Assumption [Seite 45]
9.7 - 3.7 Computation [Seite 46]
9.8 - 3.8 Categorical Features [Seite 47]
9.9 - 3.9 Feature Transformations, Expansions, and Interactions [Seite 49]
9.10 - 3.10 Variations in Linear Regression [Seite 50]
9.11 - 3.11 Nonparametric Regression [Seite 54]
10 - 4 Survey of Classification Techniques [Seite 55]
10.1 - 4.1 The Bayes Classifier [Seite 56]
10.2 - 4.2 Introduction to Classifiers [Seite 59]
10.3 - 4.3 A Running Example [Seite 60]
10.4 - 4.4 Likelihood Methods [Seite 62]
10.4.1 - 4.4.1 Quadratic Discriminant Analysis [Seite 63]
10.4.2 - 4.4.2 Linear Discriminant Analysis [Seite 65]
10.4.3 - 4.4.3 Gaussian Mixture Models [Seite 67]
10.4.4 - 4.4.4 Kernel Density Estimation [Seite 69]
10.4.5 - 4.4.5 Histograms [Seite 73]
10.4.6 - 4.4.6 The Naive Bayes Classifier [Seite 76]
10.5 - 4.5 Prototype Methods [Seite 76]
10.5.1 - 4.5.1 k-Nearest-Neighbor [Seite 77]
10.5.2 - 4.5.2 Condensed k-Nearest-Neighbor [Seite 78]
10.5.3 - 4.5.3 Nearest-Cluster [Seite 78]
10.5.4 - 4.5.4 Learning Vector Quantization [Seite 80]
10.6 - 4.6 Logistic Regression [Seite 81]
10.7 - 4.7 Neural Networks [Seite 84]
10.7.1 - 4.7.1 Activation Functions [Seite 84]
10.7.2 - 4.7.2 Neurons [Seite 86]
10.7.3 - 4.7.3 Neural Networks [Seite 87]
10.7.4 - 4.7.4 Logistic Regression and Neural Networks [Seite 95]
10.8 - 4.8 Classification Trees [Seite 96]
10.8.1 - 4.8.1 Classification of Data by Leaves (Terminal Nodes) [Seite 96]
10.8.2 - 4.8.2 Impurity of Nodes and Trees [Seite 97]
10.8.3 - 4.8.3 Growing Trees [Seite 98]
10.8.4 - 4.8.4 Pruning Trees [Seite 101]
10.8.5 - 4.8.5 Regression Trees [Seite 103]
10.9 - 4.9 Support Vector Machines [Seite 103]
10.9.1 - 4.9.1 Support Vector Machine Classifiers [Seite 103]
10.9.2 - 4.9.2 Kernelization [Seite 110]
10.9.3 - 4.9.3 Proximal Support Vector Machine Classifiers [Seite 114]
10.10 - 4.10 Postscript: Example Problem Revisited [Seite 115]
11 - 5 Bias-Variance Trade-off [Seite 119]
11.1 - 5.1 Squared-Error Loss [Seite 120]
11.2 - 5.2 Arbitrary Loss [Seite 123]
12 - 6 Combining Classifiers [Seite 129]
12.1 - 6.1 Ensembles [Seite 129]
12.2 - 6.2 Ensemble Design [Seite 132]
12.3 - 6.3 Bootstrap Aggregation (Bagging) [Seite 134]
12.4 - 6.4 Bumping [Seite 137]
12.5 - 6.5 Random Forests [Seite 138]
12.6 - 6.6 Boosting [Seite 140]
12.7 - 6.7 Arcing [Seite 143]
12.8 - 6.8 Stacking and Mixture of Experts [Seite 143]
13 - 7 Risk Estimation and Model Selection [Seite 149]
13.1 - 7.1 Risk Estimation via Training Data [Seite 150]
13.2 - 7.2 Risk Estimation via Validation or Test Data [Seite 150]
13.2.1 - 7.2.1 Training, Validation, and Test Data [Seite 150]
13.2.2 - 7.2.2 Risk Estimation [Seite 151]
13.2.3 - 7.2.3 Size of Training, Validation, and Test Sets [Seite 152]
13.2.4 - 7.2.4 Testing Hypotheses About Risk [Seite 153]
13.2.5 - 7.2.5 Example of Use of Training, Validation, and Test Sets [Seite 154]
13.3 - 7.3 Cross-Validation [Seite 155]
13.4 - 7.4 Improvements on Cross-Validation [Seite 157]
13.5 - 7.5 Out-of-Bag Risk Estimation [Seite 159]
13.6 - 7.6 Akaike's Information Criterion [Seite 160]
13.7 - 7.7 Schwartz's Bayesian Information Criterion [Seite 160]
13.8 - 7.8 Rissanen's Minimum Description Length Criterion [Seite 161]
13.9 - 7.9 R2 and Adjusted R2 [Seite 162]
13.10 - 7.10 Stepwise Model Selection [Seite 163]
13.11 - 7.11 Occam's Razor [Seite 164]
14 - 8 Consistency [Seite 165]
14.1 - 8.1 Convergence of Sequences of Random Variables [Seite 166]
14.2 - 8.2 Consistency for Parameter Estimation [Seite 166]
14.3 - 8.3 Consistency for Prediction [Seite 167]
14.4 - 8.4 There Are Consistent and Universally Consistent Classifiers [Seite 167]
14.5 - 8.5 Convergence to Asymptopia Is Not Uniform and May Be Slow [Seite 169]
15 - 9 Clustering [Seite 171]
15.1 - 9.1 Gaussian Mixture Models [Seite 172]
15.2 - 9.2 k-Means [Seite 172]
15.3 - 9.3 Clustering by Mode-Hunting in a Density Estimate [Seite 173]
15.4 - 9.4 Using Classifiers to Cluster [Seite 174]
15.5 - 9.5 Dissimilarity [Seite 175]
15.6 - 9.6 k-Medoids [Seite 175]
15.7 - 9.7 Agglomerative Hierarchical Clustering [Seite 176]
15.8 - 9.8 Divisive Hierarchical Clustering [Seite 177]
15.9 - 9.9 How Many Clusters Are There? Interpretation of Clustering [Seite 177]
15.10 - 9.10 An Impossibility Theorem [Seite 179]
16 - 10 Optimization [Seite 181]
16.1 - 10.1 Quasi-Newton Methods [Seite 182]
16.1.1 - 10.1.1 Newton's Method for Finding Zeros [Seite 182]
16.1.2 - 10.1.2 Newton's Method for Optimization [Seite 183]
16.1.3 - 10.1.3 Gradient Descent [Seite 183]
16.1.4 - 10.1.4 The BFGS Algorithm [Seite 184]
16.1.5 - 10.1.5 Modifications to Quasi-Newton Methods [Seite 184]
16.1.6 - 10.1.6 Gradients for Logistic Regression and Neural Networks [Seite 185]
16.2 - 10.2 The Nelder-Mead Algorithm [Seite 188]
16.3 - 10.3 Simulated Annealing [Seite 190]
16.4 - 10.4 Genetic Algorithms [Seite 190]
16.5 - 10.5 Particle Swarm Optimization [Seite 191]
16.6 - 10.6 General Remarks on Optimization [Seite 192]
16.6.1 - 10.6.1 Imperfectly Known Objective Functions [Seite 192]
16.6.2 - 10.6.2 Objective Functions Which Are Sums [Seite 193]
16.6.3 - 10.6.3 Optimization from Multiple Starting Points [Seite 194]
16.7 - 10.7 The Expectation-Maximization Algorithm [Seite 195]
16.7.1 - 10.7.1 The General Algorithm [Seite 195]
16.7.2 - 10.7.2 EM Climbs the Marginal Likelihood of the Observations [Seite 195]
16.7.3 - 10.7.3 Example-Fitting a Gaussian Mixture Model Via EM [Seite 198]
16.7.4 - 10.7.4 Example-The Expectation Step [Seite 199]
16.7.5 - 10.7.5 Example-The Maximization Step [Seite 200]
17 - 11 High-Dimensional Data [Seite 201]
17.1 - 11.1 The Curse of Dimensionality [Seite 202]
17.2 - 11.2 Two Running Examples [Seite 209]
17.2.1 - 11.2.1 Example 1: Equilateral Simplex [Seite 209]
17.2.2 - 11.2.2 Example 2: Text [Seite 209]
17.3 - 11.3 Reducing Dimension While Preserving Information [Seite 212]
17.3.1 - 11.3.1 The Geometry of Means and Covariances of Real Features [Seite 212]
17.3.2 - 11.3.2 Principal Component Analysis [Seite 214]
17.3.3 - 11.3.3 Working in "Dissimilarity Space" [Seite 215]
17.3.4 - 11.3.4 Linear Multidimensional Scaling [Seite 217]
17.3.5 - 11.3.5 The Singular Value Decomposition and Low-Rank Approximation [Seite 219]
17.3.6 - 11.3.6 Stress-Minimizing Multidimensional Scaling [Seite 221]
17.3.7 - 11.3.7 Projection Pursuit [Seite 221]
17.3.8 - 11.3.8 Feature Selection [Seite 223]
17.3.9 - 11.3.9 Clustering [Seite 224]
17.3.10 - 11.3.10 Manifold Learning [Seite 224]
17.3.11 - 11.3.11 Autoencoders [Seite 227]
17.4 - 11.4 Model Regularization [Seite 231]
17.4.1 - 11.4.1 Duality and the Geometry of Parameter Penalization [Seite 234]
17.4.2 - 11.4.2 Parameter Penalization as Prior Information [Seite 235]
18 - 12 Communication with Clients [Seite 239]
18.1 - 12.1 Binary Classification and Hypothesis Testing [Seite 240]
18.2 - 12.2 Terminology for Binary Decisions [Seite 241]
18.3 - 12.3 ROC Curves [Seite 241]
18.4 - 12.4 One-Dimensional Measures of Performance [Seite 246]
18.5 - 12.5 Confusion Matrices [Seite 247]
18.6 - 12.6 Multiple Testing [Seite 248]
18.6.1 - 12.6.1 Control the Familywise Error [Seite 248]
18.6.2 - 12.6.2 Control the False Discovery Rate [Seite 249]
18.7 - 12.7 Expert Systems [Seite 250]
19 - 13 Current Challenges in Machine Learning [Seite 253]
19.1 - 13.1 Streaming Data [Seite 253]
19.2 - 13.2 Distributed Data [Seite 253]
19.3 - 13.3 Semi-supervised Learning [Seite 254]
19.4 - 13.4 Active Learning [Seite 254]
19.5 - 13.5 Feature Construction via Deep Neural Networks [Seite 255]
19.6 - 13.6 Transfer Learning [Seite 255]
19.7 - 13.7 Interpretability of Complex Models [Seite 255]
20 - 14 R Source Code [Seite 257]
20.1 - 14.1 Author's Biases [Seite 258]
20.2 - 14.2 Libraries [Seite 258]
20.3 - 14.3 The Running Example (Section 4.3) [Seite 259]
20.4 - 14.4 The Bayes Classifier (Section 4.1) [Seite 263]
20.5 - 14.5 Quadratic Discriminant Analysis (Section 4.4.1) [Seite 265]
20.6 - 14.6 Linear Discriminant Analysis (Section 4.4.2) [Seite 265]
20.7 - 14.7 Gaussian Mixture Models (Section 4.4.3) [Seite 266]
20.8 - 14.8 Kernel Density Estimation (Section 4.4.4) [Seite 267]
20.9 - 14.9 Histograms (Section 4.4.5) [Seite 270]
20.10 - 14.10 The Naive Bayes Classifier (Section 4.4.6) [Seite 275]
20.11 - 14.11 k-Nearest-Neighbor (Section 4.5.1) [Seite 277]
20.12 - 14.12 Learning Vector Quantization (Section 4.5.4) [Seite 279]
20.13 - 14.13 Logistic Regression (Section 4.6) [Seite 281]
20.14 - 14.14 Neural Networks (Section 4.7) [Seite 282]
20.15 - 14.15 Classification Trees (Section 4.8) [Seite 285]
20.16 - 14.16 Support Vector Machines (Section 4.9) [Seite 289]
20.17 - 14.17 Bootstrap Aggregation (Section 6.3) [Seite 294]
20.18 - 14.18 Boosting (Section 6.6) [Seite 296]
20.19 - 14.19 Arcing (Section 6.7) [Seite 297]
20.20 - 14.20 Random Forests (Section 6.5) [Seite 297]
21 - Appendix A List of Symbols [Seite 299]
22 - Appendix B Solutions to Selected Exercises [Seite 301]
23 - Appendix C Converting Between Normal Parameters and Level-Curve Ellipsoids [Seite 321]
23.1 - C.1 Parameters to Axes [Seite 322]
23.2 - C.2 Axes to Parameters [Seite 322]
24 - Appendix D Training Data and Fitted Parameters [Seite 323]
24.1 - D.1 Training Data [Seite 323]
24.2 - D.2 Fitted Model Parameters [Seite 324]
24.2.1 - D.2.1 Quadratic and Linear Discriminant Analysis [Seite 324]
24.2.2 - D.2.2 Logistic Regression [Seite 325]
24.2.3 - D.2.3 Neural Network [Seite 325]
24.2.4 - D.2.4 Classification Tree [Seite 325]
25 - References [Seite 327]
26 - Index [Seite 337]
27 - EULA [Seite 343]