
Machine Learning Pocket Reference
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
With detailed notes, tables, and examples, this handy reference will help you navigate the basics of structured machine learning. Author Matt Harrison delivers a valuable guide that you can use for additional support during training and as a convenient resource when you dive into your next machine learning project.
Ideal for programmers, data scientists, and AI engineers, this book includes an overview of the machine learning process and walks you through classification with structured data. You'll also learn methods for clustering, predicting a continuous value (regression), and reducing dimensionality, among other topics.
This pocket reference includes sections that cover:
- Classification, using the Titanic dataset
- Cleaning data and dealing with missing data
- Exploratory data analysis
- Common preprocessing steps using sample data
- Selecting features useful to the model
- Model selection
- Metrics and classification evaluation
- Regression examples using k-nearest neighbor, decision trees, boosting, and more
- Metrics for regression evaluation
- Clustering
- Dimensionality reduction
- Scikit-learn pipelines
More details
Other editions
Additional editions

Content
- Intro
- Copyright
- Table of Contents
- Preface
- What to Expect
- Who This Book Is For
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Chapter 1. Introduction
- Libraries Used
- Installation with Pip
- Installation with Conda
- Chapter 2. Overview of the Machine Learning Process
- Chapter 3. Classification Walkthrough: Titanic Dataset
- Project Layout Suggestion
- Imports
- Ask a Question
- Terms for Data
- Gather Data
- Clean Data
- Create Features
- Sample Data
- Impute Data
- Normalize Data
- Refactor
- Baseline Model
- Various Families
- Stacking
- Create Model
- Evaluate Model
- Optimize Model
- Confusion Matrix
- ROC Curve
- Learning Curve
- Deploy Model
- Chapter 4. Missing Data
- Examining Missing Data
- Dropping Missing Data
- Imputing Data
- Adding Indicator Columns
- Chapter 5. Cleaning Data
- Column Names
- Replacing Missing Values
- Chapter 6. Exploring
- Data Size
- Summary Stats
- Histogram
- Scatter Plot
- Joint Plot
- Pair Grid
- Box and Violin Plots
- Comparing Two Ordinal Values
- Correlation
- RadViz
- Parallel Coordinates
- Chapter 7. Preprocess Data
- Standardize
- Scale to Range
- Dummy Variables
- Label Encoder
- Frequency Encoding
- Pulling Categories from Strings
- Other Categorical Encoding
- Date Feature Engineering
- Add col_na Feature
- Manual Feature Engineering
- Chapter 8. Feature Selection
- Collinear Columns
- Lasso Regression
- Recursive Feature Elimination
- Mutual Information
- Principal Component Analysis
- Feature Importance
- Chapter 9. Imbalanced Classes
- Use a Different Metric
- Tree-based Algorithms and Ensembles
- Penalize Models
- Upsampling Minority
- Generate Minority Data
- Downsampling Majority
- Upsampling Then Downsampling
- Chapter 10. Classification
- Logistic Regression
- Naive Bayes
- Support Vector Machine
- K-Nearest Neighbor
- Decision Tree
- Random Forest
- XGBoost
- Gradient Boosted with LightGBM
- TPOT
- Chapter 11. Model Selection
- Validation Curve
- Learning Curve
- Chapter 12. Metrics and Classification Evaluation
- Confusion Matrix
- Metrics
- Accuracy
- Recall
- Precision
- F1
- Classification Report
- ROC
- Precision-Recall Curve
- Cumulative Gains Plot
- Lift Curve
- Class Balance
- Class Prediction Error
- Discrimination Threshold
- Chapter 13. Explaining Models
- Regression Coefficients
- Feature Importance
- LIME
- Tree Interpretation
- Partial Dependence Plots
- Surrogate Models
- Shapley
- Chapter 14. Regression
- Baseline Model
- Linear Regression
- SVMs
- K-Nearest Neighbor
- Decision Tree
- Random Forest
- XGBoost Regression
- LightGBM Regression
- Chapter 15. Metrics and Regression Evaluation
- Metrics
- Residuals Plot
- Heteroscedasticity
- Normal Residuals
- Prediction Error Plot
- Chapter 16. Explaining Regression Models
- Shapley
- Chapter 17. Dimensionality Reduction
- PCA
- UMAP
- t-SNE
- PHATE
- Chapter 18. Clustering
- K-Means
- Agglomerative (Hierarchical) Clustering
- Understanding Clusters
- Chapter 19. Pipelines
- Classification Pipeline
- Regression Pipeline
- PCA Pipeline
- Index
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.