Data Mining

Practical Machine Learning Tools and Techniques
 
 
Morgan Kaufmann (Verlag)
  • 4. Auflage
  • |
  • erschienen am 1. Oktober 2016
  • |
  • 654 Seiten
 
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
E-Book | PDF mit Adobe DRM | Systemvoraussetzungen
978-0-12-804357-8 (ISBN)
 

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.

Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.

Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html

It contains

  • Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book
  • Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book
  • Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc.
  • Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects
  • Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods
  • Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface
  • Includes open-access online courses that introduce practical applications of the material in the book


Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography. He has written several books, the latest being Managing Gigabytes (1999) and Data Mining (2000), both from Morgan Kaufmann.

  • Englisch
  • San Francisco
  • |
  • USA
Elsevier Science
  • 13,95 MB
978-0-12-804357-8 (9780128043578)
0128043571 (0128043571)
weitere Ausgaben werden ermittelt
  • Front Cover
  • Data Mining
  • Copyright Page
  • Contents
  • List of Figures
  • List of Tables
  • Preface
  • Updated and Revised Content
  • Second Edition
  • Third Edition
  • Fourth Edition
  • Acknowledgments
  • I. Introduction to data mining
  • 1 What's it all about?
  • 1.1 Data Mining and Machine Learning
  • Describing Structural Patterns
  • Machine Learning
  • Data Mining
  • 1.2 Simple Examples: The Weather Problem and Others
  • The Weather Problem
  • Contact Lenses: An Idealized Problem
  • Irises: A Classic Numeric Dataset
  • CPU Performance: Introducing Numeric Prediction
  • Labor Negotiations: A More Realistic Example
  • Soybean Classification: A Classic Machine Learning Success
  • 1.3 Fielded Applications
  • Web Mining
  • Decisions Involving Judgment
  • Screening Images
  • Load Forecasting
  • Diagnosis
  • Marketing and Sales
  • Other Applications
  • 1.4 The Data Mining Process
  • 1.5 Machine Learning and Statistics
  • 1.6 Generalization as Search
  • Enumerating the Concept Space
  • Bias
  • Language bias
  • Search bias
  • Overfitting-avoidance bias
  • 1.7 Data Mining and Ethics
  • Reidentification
  • Using Personal Information
  • Wider Issues
  • 1.8 Further Reading and Bibliographic Notes
  • 2 Input: concepts, instances, attributes
  • 2.1 What's a Concept?
  • 2.2 What's in an Example?
  • Relations
  • Other Example Types
  • 2.3 What's in an Attribute?
  • 2.4 Preparing the Input
  • Gathering the Data Together
  • ARFF Format
  • Sparse Data
  • Attribute Types
  • Missing Values
  • Inaccurate Values
  • Unbalanced Data
  • Getting to Know Your Data
  • 2.5 Further Reading and Bibliographic Notes
  • 3 Output: knowledge representation
  • 3.1 Tables
  • 3.2 Linear Models
  • 3.3 Trees
  • 3.4 Rules
  • Classification Rules
  • Association Rules
  • Rules With Exceptions
  • More Expressive Rules
  • 3.5 Instance-Based Representation
  • 3.6 Clusters
  • 3.7 Further Reading and Bibliographic Notes
  • 4 Algorithms: the basic methods
  • 4.1 Inferring Rudimentary Rules
  • Missing Values and Numeric Attributes
  • 4.2 Simple Probabilistic Modeling
  • Missing Values and Numeric Attributes
  • Naïve Bayes for Document Classification
  • Remarks
  • 4.3 Divide-and-Conquer: Constructing Decision Trees
  • Calculating Information
  • Highly Branching Attributes
  • 4.4 Covering Algorithms: Constructing Rules
  • Rules Versus Trees
  • A Simple Covering Algorithm
  • Rules Versus Decision Lists
  • 4.5 Mining Association Rules
  • Item Sets
  • Association Rules
  • Generating Rules Efficiently
  • 4.6 Linear Models
  • Numeric Prediction: Linear Regression
  • Linear Classification: Logistic Regression
  • Linear Classification Using the Perceptron
  • Linear Classification Using Winnow
  • 4.7 Instance-Based Learning
  • The Distance Function
  • Finding Nearest Neighbors Efficiently
  • Remarks
  • 4.8 Clustering
  • Iterative Distance-Based Clustering
  • Faster Distance Calculations
  • Choosing the Number of Clusters
  • Hierarchical Clustering
  • Example of Hierarchical Clustering
  • Incremental Clustering
  • Category Utility
  • Remarks
  • 4.9 Multi-instance Learning
  • Aggregating the Input
  • Aggregating the Output
  • 4.10 Further Reading and Bibliographic Notes
  • 4.11 Weka Implementations
  • 5 Credibility: evaluating what's been learned
  • 5.1 Training and Testing
  • 5.2 Predicting Performance
  • 5.3 Cross-Validation
  • 5.4 Other Estimates
  • Leave-One-Out
  • The Bootstrap
  • 5.5 Hyperparameter Selection
  • 5.6 Comparing Data Mining Schemes
  • 5.7 Predicting Probabilities
  • Quadratic Loss Function
  • Informational Loss Function
  • Remarks
  • 5.8 Counting the Cost
  • Cost-Sensitive Classification
  • Cost-Sensitive Learning
  • Lift Charts
  • ROC Curves
  • Recall-Precision Curves
  • Remarks
  • Cost Curves
  • 5.9 Evaluating Numeric Prediction
  • 5.10 The MDL Principle
  • 5.11 Applying the MDL Principle to Clustering
  • 5.12 Using a Validation Set for Model Selection
  • 5.13 Further Reading and Bibliographic Notes
  • II. More advanced machine learning schemes
  • 6 Trees and rules
  • 6.1 Decision Trees
  • Numeric Attributes
  • Missing Values
  • Pruning
  • Estimating Error Rates
  • Complexity of Decision Tree Induction
  • From Trees to Rules
  • C4.5: Choices and Options
  • Cost-Complexity Pruning
  • Discussion
  • 6.2 Classification Rules
  • Criteria for Choosing Tests
  • Missing Values, Numeric Attributes
  • Generating Good Rules
  • Using Global Optimization
  • Obtaining Rules From Partial Decision Trees
  • Rules With Exceptions
  • Discussion
  • 6.3 Association Rules
  • Building a Frequent Pattern Tree
  • Finding Large Item Sets
  • Discussion
  • 6.4 Weka Implementations
  • 7 Extending instance-based and linear models
  • 7.1 Instance-Based Learning
  • Reducing the Number of Exemplars
  • Pruning Noisy Exemplars
  • Weighting Attributes
  • Generalizing Exemplars
  • Distance Functions for Generalized Exemplars
  • Generalized Distance Functions
  • Discussion
  • 7.2 Extending Linear Models
  • The Maximum Margin Hyperplane
  • Nonlinear Class Boundaries
  • Support Vector Regression
  • Kernel Ridge Regression
  • The Kernel Perceptron
  • Multilayer Perceptrons
  • Backpropagation
  • Radial Basis Function Networks
  • Stochastic Gradient Descent
  • Discussion
  • 7.3 Numeric Prediction With Local Linear Models
  • Model Trees
  • Building the Tree
  • Pruning the Tree
  • Nominal Attributes
  • Missing Values
  • Pseudocode for Model Tree Induction
  • Rules From Model Trees
  • Locally Weighted Linear Regression
  • Discussion
  • 7.4 Weka Implementations
  • 8 Data transformations
  • 8.1 Attribute Selection
  • Scheme-Independent Selection
  • Searching the Attribute Space
  • Scheme-Specific Selection
  • 8.2 Discretizing Numeric Attributes
  • Unsupervised Discretization
  • Entropy-Based Discretization
  • Other Discretization Methods
  • Entropy-Based Versus Error-Based Discretization
  • Converting Discrete to Numeric Attributes
  • 8.3 Projections
  • Principal Component Analysis
  • Random Projections
  • Partial Least Squares Regression
  • Independent Component Analysis
  • Linear Discriminant Analysis
  • Quadratic Discriminant Analysis
  • Fisher's Linear Discriminant Analysis
  • Text to Attribute Vectors
  • Time Series
  • 8.4 Sampling
  • Reservoir Sampling
  • 8.5 Cleansing
  • Improving Decision Trees
  • Robust Regression
  • Detecting Anomalies
  • One-Class Learning
  • Outlier Detection
  • Generating Artificial Data
  • 8.6 Transforming Multiple Classes to Binary Ones
  • Simple Methods
  • Error-Correcting Output Codes
  • Ensembles of Nested Dichotomies
  • 8.7 Calibrating Class Probabilities
  • 8.8 Further Reading and Bibliographic Notes
  • 8.9 Weka Implementations
  • 9 Probabilistic methods
  • 9.1 Foundations
  • Maximum Likelihood Estimation
  • Maximum a Posteriori Parameter Estimation
  • 9.2 Bayesian Networks
  • Making Predictions
  • Learning Bayesian Networks
  • Specific Algorithms
  • Data Structures for Fast Learning
  • 9.3 Clustering and Probability Density Estimation
  • The Expectation Maximization Algorithm for a Mixture of Gaussians
  • Extending the Mixture Model
  • Clustering Using Prior Distributions
  • Clustering With Correlated Attributes
  • Kernel Density Estimation
  • Comparing Parametric, Semiparametric and Nonparametric Density Models for Classification
  • 9.4 Hidden Variable Models
  • Expected Log-Likelihoods and Expected Gradients
  • The Expectation Maximization Algorithm
  • Applying the Expectation Maximization Algorithm to Bayesian Networks
  • 9.5 Bayesian Estimation and Prediction
  • Probabilistic Inference Methods
  • Probability propagation
  • Sampling, simulated annealing, and iterated conditional modes
  • Variational inference
  • 9.6 Graphical Models and Factor Graphs
  • Graphical Models and Plate Notation
  • Probabilistic Principal Component Analysis
  • Inference with PPCA
  • Marginal log-likelihood for PPCA
  • Expected log-likelihood for PPCA
  • Expected gradient for PPCA
  • EM for PPCA
  • Latent Semantic Analysis
  • Using Principal Component Analysis for Dimensionality Reduction
  • Probabilistic LSA
  • Latent Dirichlet Allocation
  • Factor Graphs
  • Factor graphs, Bayesian networks, and the logistic regression model
  • Markov Random Fields
  • Computing Using the Sum-Product and Max-Product Algorithms
  • Marginal probabilities
  • The sum-product algorithm
  • Sum-product algorithm example
  • Most probable explanation example
  • The max-product or max-sum algorithm
  • 9.7 Conditional Probability Models
  • Linear and Polynomial Regression as Probability Models
  • Using Priors on Parameters
  • Matrix vector formulations of linear and polynomial regression
  • Multiclass Logistic Regression
  • Matrix vector formulation of multiclass logistic regression
  • Priors on parameters, and the regularized loss function
  • Gradient Descent and Second-Order Methods
  • Generalized Linear Models
  • Making Predictions for Ordered Classes
  • Conditional Probabilistic Models Using Kernels
  • 9.8 Sequential and Temporal Models
  • Markov Models and N-gram Methods
  • Hidden Markov Models
  • Conditional Random Fields
  • From Markov random fields to conditional random fields
  • Linear chain conditional random fields
  • Learning for chain-structured conditional random fields
  • Using conditional random fields for text mining
  • 9.9 Further Reading and Bibliographic Notes
  • Software Packages and Implementations
  • 9.10 Weka Implementations
  • 10 Deep learning
  • 10.1 Deep Feedforward Networks
  • The MNIST Evaluation
  • Losses and Regularization
  • Deep Layered Network Architecture
  • Activation Functions
  • Backpropagation Revisited
  • Computation Graphs and Complex Network Structures
  • Checking Backpropagation Implementations
  • 10.2 Training and Evaluating Deep Networks
  • Early Stopping
  • Validation, Cross-Validation, and Hyperparameter Tuning
  • Mini-Batch-Based Stochastic Gradient Descent
  • Pseudocode for Mini-Batch Based Stochastic Gradient Descent
  • Learning Rates and Schedules
  • Regularization With Priors on Parameters
  • Dropout
  • Batch Normalization
  • Parameter Initialization
  • Unsupervised Pretraining
  • Data Augmentation and Synthetic Transformations
  • 10.3 Convolutional Neural Networks
  • The ImageNet Evaluation and Very Deep Convolutional Networks
  • From Image Filtering to Learnable Convolutional Layers
  • Convolutional Layers and Gradients
  • Pooling and Subsampling Layers and Gradients
  • Implementation
  • 10.4 Autoencoders
  • Pretraining Deep Autoencoders With Rbms
  • Denoising Autoencoders and Layerwise Training
  • Combining Reconstructive and Discriminative Learning
  • 10.5 Stochastic Deep Networks
  • Boltzmann Machines
  • Restricted Boltzmann Machines
  • Contrastive Divergence
  • Categorical and Continuous Variables
  • Deep Boltzmann Machines
  • Deep Belief Networks
  • 10.6 Recurrent Neural Networks
  • Exploding and Vanishing Gradients
  • Other Recurrent Network Architectures
  • 10.7 Further Reading and Bibliographic Notes
  • 10.8 Deep Learning Software and Network Implementations
  • Theano
  • Tensor Flow
  • Torch
  • Computational Network Toolkit
  • Caffe
  • Deeplearning4j
  • Other Packages: Lasagne, Keras, and cuDNN
  • 10.9 WEKA Implementations
  • 11 Beyond supervised and unsupervised learning
  • 11.1 Semisupervised Learning
  • Clustering for Classification
  • Cotraining
  • EM and Cotraining
  • Neural Network Approaches
  • 11.2 Multi-instance Learning
  • Converting to Single-Instance Learning
  • Upgrading Learning Algorithms
  • Dedicated Multi-instance Methods
  • 11.3 Further Reading and Bibliographic Notes
  • 11.4 WEKA Implementations
  • 12 Ensemble learning
  • 12.1 Combining Multiple Models
  • 12.2 Bagging
  • Bias-Variance Decomposition
  • Bagging With Costs
  • 12.3 Randomization
  • Randomization Versus Bagging
  • Rotation Forests
  • 12.4 Boosting
  • AdaBoost
  • The Power of Boosting
  • 12.5 Additive Regression
  • Numeric Prediction
  • Additive Logistic Regression
  • 12.6 Interpretable Ensembles
  • Option Trees
  • Logistic Model Trees
  • 12.7 Stacking
  • 12.8 Further Reading and Bibliographic Notes
  • 12.9 WEKA Implementations
  • 13 Moving on: applications and beyond
  • 13.1 Applying Machine Learning
  • 13.2 Learning From Massive Datasets
  • 13.3 Data Stream Learning
  • 13.4 Incorporating Domain Knowledge
  • 13.5 Text Mining
  • Document Classification and Clustering
  • Information Extraction
  • Natural Language Processing
  • 13.6 Web Mining
  • Wrapper Induction
  • Page Rank
  • 13.7 Images and Speech
  • Images
  • Speech
  • 13.8 Adversarial Situations
  • 13.9 Ubiquitous Data Mining
  • 13.10 Further Reading and Bibliographic Notes
  • 13.11 WEKA Implementations
  • Appendix A: Theoretical foundations
  • A.1 Matrix Algebra
  • Basic Manipulations and Properties
  • Derivatives of Vector and Scalar Functions
  • The Chain Rule
  • Computation Graphs and Backpropagation
  • Derivatives of Functions of Vectors and Matrices
  • Vector Taylor Series Expansion, Second-Order Methods, and Learning Rates
  • Eigenvectors, Eigenvalues, and Covariance Matrices
  • The Singular Value Decomposition
  • A.2 Fundamental Elements of Probabilistic Methods
  • Expectations
  • Conjugate Priors
  • Bernoulli, Binomial, and Beta Distributions
  • Categorical, Multinomial, and Dirichlet Distributions
  • Estimating the Parameters of a Discrete Distribution
  • The Gaussian Distribution
  • Useful Properties of Linear Gaussian Models
  • Probabilistic PCA and the Eigenvectors of a Covariance Matrix
  • The Exponential Family of Distributions
  • Variational Methods and the EM Algorithm
  • Appendix B: The WEKA workbench
  • B.1 What's in WEKA?
  • How do you use it?
  • What else can you do?
  • B.2 The package management system
  • B.3 The Explorer
  • Loading the data into the Explorer
  • Building a decision tree
  • Examining the output
  • Working with models
  • Exploring the Explorer
  • Loading and filtering files
  • Clustering and association rules
  • Attribute selection
  • Visualization
  • Filtering algorithms
  • Learning algorithms
  • Attribute selection
  • B.4 The Knowledge Flow Interface
  • Getting started
  • Knowledge Flow components
  • Configuring and connecting the components
  • Incremental learning
  • B.5 The Experimenter
  • Getting started
  • Running an experiment
  • Analyzing the results
  • Advanced setup
  • The Analyze panel
  • References
  • Index
  • Back Cover

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Dateiformat: PDF
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

60,63 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
PDF mit Adobe DRM
siehe Systemvoraussetzungen
Hinweis: Die Auswahl des von Ihnen gewünschten Dateiformats und des Kopierschutzes erfolgt erst im System des E-Book Anbieters
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok