
Machine Learning with Python Cookbook
Beschreibung
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This practical guide provides more than 200 self-contained recipes to help you solve machine learning challenges you may encounter in your work. If you''re comfortable with Python and its libraries, including pandas and scikit-learn, you''ll be able to address specific problems, from loading data to training models and leveraging neural networks.
Each recipe in this updated edition includes code that you can copy, paste, and run with a toy dataset to ensure that it works. From there, you can adapt these recipes according to your use case or application. Recipes include a discussion that explains the solution and provides meaningful context.
Go beyond theory and concepts by learning the nuts and bolts you need to construct working machine learning applications. You''ll find recipes for:
- Vectors, matrices, and arrays
- Working with data from CSV, JSON, SQL, databases, cloud storage, and other sources
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Supporting vector machines (SVM), naäve Bayes, clustering, and tree-based models
- Saving, loading, and serving trained models from multiple frameworks
Weitere Details
Weitere Ausgaben
Andere Ausgaben

Inhalt
- Cover
- Copyright
- Table of Contents
- Preface
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Chapter 1. Working with Vectors, Matrices, and Arrays in NumPy
- 1.0 Introduction
- 1.1 Creating a Vector
- Problem
- Solution
- Discussion
- See Also
- 1.2 Creating a Matrix
- Problem
- Solution
- Discussion
- See Also
- 1.3 Creating a Sparse Matrix
- Problem
- Solution
- Discussion
- See Also
- 1.4 Preallocating NumPy Arrays
- Problem
- Solution
- Discussion
- 1.5 Selecting Elements
- Problem
- Solution
- Discussion
- 1.6 Describing a Matrix
- Problem
- Solution
- Discussion
- 1.7 Applying Functions over Each Element
- Problem
- Solution
- Discussion
- 1.8 Finding the Maximum and Minimum Values
- Problem
- Solution
- Discussion
- 1.9 Calculating the Average, Variance, and Standard Deviation
- Problem
- Solution
- Discussion
- 1.10 Reshaping Arrays
- Problem
- Solution
- Discussion
- 1.11 Transposing a Vector or Matrix
- Problem
- Solution
- Discussion
- 1.12 Flattening a Matrix
- Problem
- Solution
- Discussion
- 1.13 Finding the Rank of a Matrix
- Problem
- Solution
- Discussion
- See Also
- 1.14 Getting the Diagonal of a Matrix
- Problem
- Solution
- Discussion
- 1.15 Calculating the Trace of a Matrix
- Problem
- Solution
- Discussion
- See Also
- 1.16 Calculating Dot Products
- Problem
- Solution
- Discussion
- See Also
- 1.17 Adding and Subtracting Matrices
- Problem
- Solution
- Discussion
- 1.18 Multiplying Matrices
- Problem
- Solution
- Discussion
- See Also
- 1.19 Inverting a Matrix
- Problem
- Solution
- Discussion
- See Also
- 1.20 Generating Random Values
- Problem
- Solution
- Discussion
- Chapter 2. Loading Data
- 2.0 Introduction
- 2.1 Loading a Sample Dataset
- Problem
- Solution
- Discussion
- See Also
- 2.2 Creating a Simulated Dataset
- Problem
- Solution
- Discussion
- See Also
- 2.3 Loading a CSV File
- Problem
- Solution
- Discussion
- 2.4 Loading an Excel File
- Problem
- Solution
- Discussion
- 2.5 Loading a JSON File
- Problem
- Solution
- Discussion
- See Also
- 2.6 Loading a Parquet File
- Problem
- Solution
- Discussion
- See Also
- 2.7 Loading an Avro File
- Problem
- Solution
- Discussion
- See Also
- 2.8 Querying a SQLite Database
- Problem
- Solution
- Discussion
- See Also
- 2.9 Querying a Remote SQL Database
- Problem
- Solution
- Discussion
- See Also
- 2.10 Loading Data from a Google Sheet
- Problem
- Solution
- Discussion
- See Also
- 2.11 Loading Data from an S3 Bucket
- Problem
- Solution
- Discussion
- See Also
- 2.12 Loading Unstructured Data
- Problem
- Solution
- Discussion
- See Also
- Chapter 3. Data Wrangling
- 3.0 Introduction
- 3.1 Creating a Dataframe
- Problem
- Solution
- Discussion
- 3.2 Getting Information about the Data
- Problem
- Solution
- Discussion
- 3.3 Slicing DataFrames
- Problem
- Solution
- Discussion
- 3.4 Selecting Rows Based on Conditionals
- Problem
- Solution
- Discussion
- 3.5 Sorting Values
- Problem
- Solution
- Discussion
- 3.6 Replacing Values
- Problem
- Solution
- Discussion
- 3.7 Renaming Columns
- Problem
- Solution
- Discussion
- 3.8 Finding the Minimum, Maximum, Sum, Average, and Count
- Problem
- Solution
- Discussion
- 3.9 Finding Unique Values
- Problem
- Solution
- Discussion
- 3.10 Handling Missing Values
- Problem
- Solution
- Discussion
- 3.11 Deleting a Column
- Problem
- Solution
- Discussion
- 3.12 Deleting a Row
- Problem
- Solution
- Discussion
- 3.13 Dropping Duplicate Rows
- Problem
- Solution
- Discussion
- 3.14 Grouping Rows by Values
- Problem
- Solution
- Discussion
- 3.15 Grouping Rows by Time
- Problem
- Solution
- Discussion
- See Also
- 3.16 Aggregating Operations and Statistics
- Problem
- Solution
- Discussion
- See Also
- 3.17 Looping over a Column
- Problem
- Solution
- Discussion
- 3.18 Applying a Function over All Elements in a Column
- Problem
- Solution
- Discussion
- 3.19 Applying a Function to Groups
- Problem
- Solution
- Discussion
- 3.20 Concatenating DataFrames
- Problem
- Solution
- Discussion
- 3.21 Merging DataFrames
- Problem
- Solution
- Discussion
- See Also
- Chapter 4. Handling Numerical Data
- 4.0 Introduction
- 4.1 Rescaling a Feature
- Problem
- Solution
- Discussion
- See Also
- 4.2 Standardizing a Feature
- Problem
- Solution
- Discussion
- 4.3 Normalizing Observations
- Problem
- Solution
- Discussion
- 4.4 Generating Polynomial and Interaction Features
- Problem
- Solution
- Discussion
- 4.5 Transforming Features
- Problem
- Solution
- Discussion
- 4.6 Detecting Outliers
- Problem
- Solution
- Discussion
- See Also
- 4.7 Handling Outliers
- Problem
- Solution
- Discussion
- See Also
- 4.8 Discretizating Features
- Problem
- Solution
- Discussion
- See Also
- 4.9 Grouping Observations Using Clustering
- Problem
- Solution
- Discussion
- 4.10 Deleting Observations with Missing Values
- Problem
- Solution
- Discussion
- See Also
- 4.11 Imputing Missing Values
- Problem
- Solution
- Discussion
- See Also
- Chapter 5. Handling Categorical Data
- 5.0 Introduction
- 5.1 Encoding Nominal Categorical Features
- Problem
- Solution
- Discussion
- See Also
- 5.2 Encoding Ordinal Categorical Features
- Problem
- Solution
- Discussion
- 5.3 Encoding Dictionaries of Features
- Problem
- Solution
- Discussion
- See Also
- 5.4 Imputing Missing Class Values
- Problem
- Solution
- Discussion
- See Also
- 5.5 Handling Imbalanced Classes
- Problem
- Solution
- Discussion
- Chapter 6. Handling Text
- 6.0 Introduction
- 6.1 Cleaning Text
- Problem
- Solution
- Discussion
- See Also
- 6.2 Parsing and Cleaning HTML
- Problem
- Solution
- Discussion
- See Also
- 6.3 Removing Punctuation
- Problem
- Solution
- Discussion
- 6.4 Tokenizing Text
- Problem
- Solution
- Discussion
- 6.5 Removing Stop Words
- Problem
- Solution
- Discussion
- 6.6 Stemming Words
- Problem
- Solution
- Discussion
- See Also
- 6.7 Tagging Parts of Speech
- Problem
- Solution
- Discussion
- See Also
- 6.8 Performing Named-Entity Recognition
- Problem
- Solution
- Discussion
- See Also
- 6.9 Encoding Text as a Bag of Words
- Problem
- Solution
- Discussion
- See Also
- 6.10 Weighting Word Importance
- Problem
- Solution
- Discussion
- See Also
- 6.11 Using Text Vectors to Calculate Text Similarity in a Search Query
- Problem
- Solution
- Discussion
- See Also
- 6.12 Using a Sentiment Analysis Classifier
- Problem
- Solution
- Discussion
- See Also
- Chapter 7. Handling Dates and Times
- 7.0 Introduction
- 7.1 Converting Strings to Dates
- Problem
- Solution
- Discussion
- See Also
- 7.2 Handling Time Zones
- Problem
- Solution
- Discussion
- 7.3 Selecting Dates and Times
- Problem
- Solution
- Discussion
- 7.4 Breaking Up Date Data into Multiple Features
- Problem
- Solution
- Discussion
- 7.5 Calculating the Difference Between Dates
- Problem
- Solution
- Discussion
- See Also
- 7.6 Encoding Days of the Week
- Problem
- Solution
- Discussion
- See Also
- 7.7 Creating a Lagged Feature
- Problem
- Solution
- Discussion
- 7.8 Using Rolling Time Windows
- Problem
- Solution
- Discussion
- See Also
- 7.9 Handling Missing Data in Time Series
- Problem
- Solution
- Discussion
- Chapter 8. Handling Images
- 8.0 Introduction
- 8.1 Loading Images
- Problem
- Solution
- Discussion
- See Also
- 8.2 Saving Images
- Problem
- Solution
- Discussion
- 8.3 Resizing Images
- Problem
- Solution
- Discussion
- 8.4 Cropping Images
- Problem
- Solution
- Discussion
- See Also
- 8.5 Blurring Images
- Problem
- Solution
- Discussion
- See Also
- 8.6 Sharpening Images
- Problem
- Solution
- Discussion
- 8.7 Enhancing Contrast
- Problem
- Solution
- Discussion
- 8.8 Isolating Colors
- Problem
- Solution
- Discussion
- 8.9 Binarizing Images
- Problem
- Solution
- Discussion
- 8.10 Removing Backgrounds
- Problem
- Solution
- Discussion
- 8.11 Detecting Edges
- Problem
- Solution
- Discussion
- See Also
- 8.12 Detecting Corners
- Problem
- Solution
- Discussion
- See Also
- 8.13 Creating Features for Machine Learning
- Problem
- Solution
- Discussion
- 8.14 Encoding Color Histograms as Features
- Problem
- Solution
- Discussion
- See Also
- 8.15 Using Pretrained Embeddings as Features
- Problem
- Solution
- Discussion
- See Also
- 8.16 Detecting Objects with OpenCV
- Problem
- Solution
- Discussion
- See Also
- 8.17 Classifying Images with Pytorch
- Problem
- Solution
- Discussion
- See Also
- Chapter 9. Dimensionality Reduction Using Feature Extraction
- 9.0 Introduction
- 9.1 Reducing Features Using Principal Components
- Problem
- Solution
- Discussion
- See Also
- 9.2 Reducing Features When Data Is Linearly Inseparable
- Problem
- Solution
- Discussion
- See Also
- 9.3 Reducing Features by Maximizing Class Separability
- Problem
- Solution
- Discussion
- See Also
- 9.4 Reducing Features Using Matrix Factorization
- Problem
- Solution
- Discussion
- See Also
- 9.5 Reducing Features on Sparse Data
- Problem
- Solution
- Discussion
- See Also
- Chapter 10. Dimensionality Reduction Using Feature Selection
- 10.0 Introduction
- 10.1 Thresholding Numerical Feature Variance
- Problem
- Solution
- Discussion
- 10.2 Thresholding Binary Feature Variance
- Problem
- Solution
- Discussion
- 10.3 Handling Highly Correlated Features
- Problem
- Solution
- Discussion
- 10.4 Removing Irrelevant Features for Classification
- Problem
- Solution
- Discussion
- 10.5 Recursively Eliminating Features
- Problem
- Solution
- Discussion
- See Also
- Chapter 11. Model Evaluation
- 11.0 Introduction
- 11.1 Cross-Validating Models
- Problem
- Solution
- Discussion
- See Also
- 11.2 Creating a Baseline Regression Model
- Problem
- Solution
- Discussion
- 11.3 Creating a Baseline Classification Model
- Problem
- Solution
- Discussion
- See Also
- 11.4 Evaluating Binary Classifier Predictions
- Problem
- Solution
- Discussion
- See Also
- 11.5 Evaluating Binary Classifier Thresholds
- Problem
- Solution
- Discussion
- See Also
- 11.6 Evaluating Multiclass Classifier Predictions
- Problem
- Solution
- Discussion
- 11.7 Visualizing a Classifier's Performance
- Problem
- Solution
- Discussion
- See Also
- 11.8 Evaluating Regression Models
- Problem
- Solution
- Discussion
- See Also
- 11.9 Evaluating Clustering Models
- Problem
- Solution
- Discussion
- See Also
- 11.10 Creating a Custom Evaluation Metric
- Problem
- Solution
- Discussion
- See Also
- 11.11 Visualizing the Effect of Training Set Size
- Problem
- Solution
- Discussion
- See Also
- 11.12 Creating a Text Report of Evaluation Metrics
- Problem
- Solution
- Discussion
- See Also
- 11.13 Visualizing the Effect of Hyperparameter Values
- Problem
- Solution
- Discussion
- See Also
- Chapter 12. Model Selection
- 12.0 Introduction
- 12.1 Selecting the Best Models Using Exhaustive Search
- Problem
- Solution
- Discussion
- See Also
- 12.2 Selecting the Best Models Using Randomized Search
- Problem
- Solution
- Discussion
- See Also
- 12.3 Selecting the Best Models from Multiple Learning Algorithms
- Problem
- Solution
- Discussion
- 12.4 Selecting the Best Models When Preprocessing
- Problem
- Solution
- Discussion
- 12.5 Speeding Up Model Selection with Parallelization
- Problem
- Solution
- Discussion
- 12.6 Speeding Up Model Selection Using Algorithm-Specific Methods
- Problem
- Solution
- Discussion
- See Also
- 12.7 Evaluating Performance After Model Selection
- Problem
- Solution
- Discussion
- Chapter 13. Linear Regression
- 13.0 Introduction
- 13.1 Fitting a Line
- Problem
- Solution
- Discussion
- 13.2 Handling Interactive Effects
- Problem
- Solution
- Discussion
- 13.3 Fitting a Nonlinear Relationship
- Problem
- Solution
- Discussion
- 13.4 Reducing Variance with Regularization
- Problem
- Solution
- Discussion
- 13.5 Reducing Features with Lasso Regression
- Problem
- Solution
- Discussion
- Chapter 14. Trees and Forests
- 14.0 Introduction
- 14.1 Training a Decision Tree Classifier
- Problem
- Solution
- Discussion
- See Also
- 14.2 Training a Decision Tree Regressor
- Problem
- Solution
- Discussion
- See Also
- 14.3 Visualizing a Decision Tree Model
- Problem
- Solution
- Discussion
- See Also
- 14.4 Training a Random Forest Classifier
- Problem
- Solution
- Discussion
- See Also
- 14.5 Training a Random Forest Regressor
- Problem
- Solution
- Discussion
- See Also
- 14.6 Evaluating Random Forests with Out-of-Bag Errors
- Problem
- Solution
- Discussion
- 14.7 Identifying Important Features in Random Forests
- Problem
- Solution
- Discussion
- 14.8 Selecting Important Features in Random Forests
- Problem
- Solution
- Discussion
- See Also
- 14.9 Handling Imbalanced Classes
- Problem
- Solution
- Discussion
- 14.10 Controlling Tree Size
- Problem
- Solution
- Discussion
- 14.11 Improving Performance Through Boosting
- Problem
- Solution
- Discussion
- See Also
- 14.12 Training an XGBoost Model
- Problem
- Solution
- Discussion
- See Also
- 14.13 Improving Real-Time Performance with LightGBM
- Problem
- Solution
- Discussion
- See Also
- Chapter 15. K-Nearest Neighbors
- 15.0 Introduction
- 15.1 Finding an Observation's Nearest Neighbors
- Problem
- Solution
- Discussion
- 15.2 Creating a K-Nearest Neighbors Classifier
- Problem
- Solution
- Discussion
- 15.3 Identifying the Best Neighborhood Size
- Problem
- Solution
- Discussion
- 15.4 Creating a Radius-Based Nearest Neighbors Classifier
- Problem
- Solution
- Discussion
- 15.5 Finding Approximate Nearest Neighbors
- Problem
- Solution
- Discussion
- See Also
- 15.6 Evaluating Approximate Nearest Neighbors
- Problem
- Solution
- Discussion
- See Also
- Chapter 16. Logistic Regression
- 16.0 Introduction
- 16.1 Training a Binary Classifier
- Problem
- Solution
- Discussion
- 16.2 Training a Multiclass Classifier
- Problem
- Solution
- Discussion
- 16.3 Reducing Variance Through Regularization
- Problem
- Solution
- Discussion
- 16.4 Training a Classifier on Very Large Data
- Problem
- Solution
- Discussion
- See Also
- 16.5 Handling Imbalanced Classes
- Problem
- Solution
- Discussion
- Chapter 17. Support Vector Machines
- 17.0 Introduction
- 17.1 Training a Linear Classifier
- Problem
- Solution
- Discussion
- 17.2 Handling Linearly Inseparable Classes Using Kernels
- Problem
- Solution
- Discussion
- 17.3 Creating Predicted Probabilities
- Problem
- Solution
- Discussion
- 17.4 Identifying Support Vectors
- Problem
- Solution
- Discussion
- 17.5 Handling Imbalanced Classes
- Problem
- Solution
- Discussion
- Chapter 18. Naive Bayes
- 18.0 Introduction
- 18.1 Training a Classifier for Continuous Features
- Problem
- Solution
- Discussion
- See Also
- 18.2 Training a Classifier for Discrete and Count Features
- Problem
- Solution
- Discussion
- 18.3 Training a Naive Bayes Classifier for Binary Features
- Problem
- Solution
- Discussion
- 18.4 Calibrating Predicted Probabilities
- Problem
- Solution
- Discussion
- Chapter 19. Clustering
- 19.0 Introduction
- 19.1 Clustering Using K-Means
- Problem
- Solution
- Discussion
- See Also
- 19.2 Speeding Up K-Means Clustering
- Problem
- Solution
- Discussion
- 19.3 Clustering Using Mean Shift
- Problem
- Solution
- Discussion
- See Also
- 19.4 Clustering Using DBSCAN
- Problem
- Solution
- Discussion
- See Also
- 19.5 Clustering Using Hierarchical Merging
- Problem
- Solution
- Discussion
- Chapter 20. Tensors with PyTorch
- 20.0 Introduction
- 20.1 Creating a Tensor
- Problem
- Solution
- Discussion
- See Also
- 20.2 Creating a Tensor from NumPy
- Problem
- Solution
- Discussion
- See Also
- 20.3 Creating a Sparse Tensor
- Problem
- Solution
- Discussion
- See Also
- 20.4 Selecting Elements in a Tensor
- Problem
- Solution
- Discussion
- See Also
- 20.5 Describing a Tensor
- Problem
- Solution
- Discussion
- 20.6 Applying Operations to Elements
- Problem
- Solution
- Discussion
- See Also
- 20.7 Finding the Maximum and Minimum Values
- Problem
- Solution
- Discussion
- 20.8 Reshaping Tensors
- Problem
- Solution
- Discussion
- 20.9 Transposing a Tensor
- Problem
- Solution
- Discussion
- 20.10 Flattening a Tensor
- Problem
- Solution
- Discussion
- 20.11 Calculating Dot Products
- Problem
- Solution
- Discussion
- See Also
- 20.12 Multiplying Tensors
- Problem
- Solution
- Discussion
- Chapter 21. Neural Networks
- 21.0 Introduction
- 21.1 Using Autograd with PyTorch
- Problem
- Solution
- Discussion
- See Also
- 21.2 Preprocessing Data for Neural Networks
- Problem
- Solution
- Discussion
- 21.3 Designing a Neural Network
- Problem
- Solution
- Discussion
- See Also
- 21.4 Training a Binary Classifier
- Problem
- Solution
- Discussion
- 21.5 Training a Multiclass Classifier
- Problem
- Solution
- Discussion
- 21.6 Training a Regressor
- Problem
- Solution
- Discussion
- 21.7 Making Predictions
- Problem
- Solution
- Discussion
- 21.8 Visualize Training History
- Problem
- Solution
- Discussion
- 21.9 Reducing Overfitting with Weight Regularization
- Problem
- Solution
- Discussion
- 21.10 Reducing Overfitting with Early Stopping
- Problem
- Solution
- Discussion
- 21.11 Reducing Overfitting with Dropout
- Problem
- Solution
- Discussion
- 21.12 Saving Model Training Progress
- Problem
- Solution
- Discussion
- 21.13 Tuning Neural Networks
- Problem
- Solution
- Discussion
- 21.14 Visualizing Neural Networks
- Problem
- Solution
- Discussion
- Chapter 22. Neural Networks for Unstructured Data
- 22.0 Introduction
- 22.1 Training a Neural Network for Image Classification
- Problem
- Solution
- Discussion
- See Also
- 22.2 Training a Neural Network for Text Classification
- Problem
- Solution
- Discussion
- 22.3 Fine-Tuning a Pretrained Model for Image Classification
- Problem
- Solution
- Discussion
- See Also
- 22.4 Fine-Tuning a Pretrained Model for Text Classification
- Problem
- Solution
- Discussion
- See Also
- Chapter 23. Saving, Loading, and Serving Trained Models
- 23.0 Introduction
- 23.1 Saving and Loading a scikit-learn Model
- Problem
- Solution
- Discussion
- 23.2 Saving and Loading a TensorFlow Model
- Problem
- Solution
- Discussion
- See Also
- 23.3 Saving and Loading a PyTorch Model
- Problem
- Solution
- Discussion
- See Also
- 23.4 Serving scikit-learn Models
- Problem
- Solution
- Discussion
- 23.5 Serving TensorFlow Models
- Problem
- Solution
- Discussion
- See Also
- 23.6 Serving PyTorch Models in Seldon
- Problem
- Solution
- Discussion
- See Also
- Index
- About the Authors
Systemvoraussetzungen
Dateiformat: PDF
Kopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
- Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).
- Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions oder die App PocketBook (siehe E-Book Hilfe).
- E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist.
Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.