
Python Machine Learning By Example
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Includes a dedicated chapter on best practices and additional best practice tips throughout the book to improve your ML solutions
- Implement ML models, such as neural networks and linear and logistic regression, from scratch
Book DescriptionThe fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engineer Yuxi (Hayden) Liu, this edition emphasizes best practices, providing invaluable insights for machine learning engineers, data scientists, and analysts. Explore advanced techniques, including two new chapters on natural language processing transformers with BERT and GPT, and multimodal computer vision models with PyTorch and Hugging Face. You'll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine. This hands-on machine learning book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your machine learning and deep learning expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide. *Email sign-up and proof of purchase requiredWhat you will learn - Follow machine learning best practices throughout data preparation and model development
- Build and improve image classifiers using convolutional neural networks (CNNs) and transfer learning
- Develop and fine-tune neural networks using TensorFlow and PyTorch
- Analyze sequence data and make predictions using recurrent neural networks (RNNs), transformers, and CLIP
- Build classifiers using support vector machines (SVMs) and boost performance with PCA
- Avoid overfitting using regularization, feature selection, and more
Who this book is forThis expanded fourth edition is ideal for data scientists, ML engineers, analysts, and students with Python programming knowledge. The real-world examples, best practices, and code prepare anyone undertaking their first serious ML project.
All prices
More details
Other editions
Additional editions

Previous edition

Content
- Cover
- Title Page
- Copyright
- Contributors
- Table of Contents
- Preface
- Making the Most Out of This Book - Get to Know Your Free Benefits
- Chapter 1: Getting Started with Machine Learning and Python
- An introduction to machine learning
- Understanding why we need machine learning
- Differentiating between machine learning and automation
- Machine learning applications
- Knowing the prerequisites
- Getting started with three types of machine learning
- A brief history of the development of machine learning algorithms
- Digging into the core of machine learning
- Generalizing with data
- Overfitting, underfitting, and the bias-variance trade-off
- Overfitting
- Underfitting
- The bias-variance trade-off
- Avoiding overfitting with cross-validation
- Avoiding overfitting with regularization
- Avoiding overfitting with feature selection and dimensionality reduction
- Data preprocessing and feature engineering
- Preprocessing and exploration
- Dealing with missing values
- Label encoding
- One-hot encoding
- Dense embedding
- Scaling
- Feature engineering
- Polynomial transformation
- Binning
- Combining models
- Voting and averaging
- Bagging
- Boosting
- Stacking
- Installing software and setting up
- Setting up Python and environments
- Installing the main Python packages
- NumPy
- SciPy
- pandas
- scikit-learn
- TensorFlow
- PyTorch
- Summary
- Exercises
- Chapter 2: Building a Movie Recommendation Engine with Naïve Bayes
- Getting started with classification
- Binary classification
- Multiclass classification
- Multi-label classification
- Exploring Naïve Bayes
- Bayes' theorem by example
- The mechanics of Naïve Bayes
- Implementing Naïve Bayes
- Implementing Naïve Bayes from scratch
- Implementing Naïve Bayes with scikit-learn
- Building a movie recommender with Naïve Bayes
- Preparing the data
- Training a Naïve Bayes model
- Evaluating classification performance
- Tuning models with cross-validation
- Summary
- Exercises
- References
- Chapter 3: Predicting Online Ad Click-Through with Tree-Based Algorithms
- A brief overview of ad click-through prediction
- Getting started with two types of data - numerical and categorical
- Exploring a decision tree from the root to the leaves
- Constructing a decision tree
- The metrics for measuring a split
- Gini Impurity
- Information Gain
- Implementing a decision tree from scratch
- Implementing a decision tree with scikit-learn
- Predicting ad click-through with a decision tree
- Ensembling decision trees - random forests
- Ensembling decision trees - gradient-boosted trees
- Summary
- Exercises
- Chapter 4: Predicting Online Ad Click-Through with Logistic Regression
- Converting categorical features to numerical - one-hot encoding and ordinal encoding
- Classifying data with logistic regression
- Getting started with the logistic function
- Jumping from the logistic function to logistic regression
- Training a logistic regression model
- Training a logistic regression model using gradient descent
- Predicting ad click-through with logistic regression using gradient descent
- Training a logistic regression model using stochastic gradient descent (SGD)
- Training a logistic regression model with regularization
- Feature selection using L1 regularization
- Feature selection using random forest
- Training on large datasets with online learning
- Handling multiclass classification
- Implementing logistic regression using TensorFlow
- Summary
- Exercises
- Chapter 5: Predicting Stock Prices with Regression Algorithms
- What is regression?
- Mining stock price data
- A brief overview of the stock market and stock prices
- Getting started with feature engineering
- Acquiring data and generating features
- Estimating with linear regression
- How does linear regression work?
- Implementing linear regression from scratch
- Implementing linear regression with scikit-learn
- Implementing linear regression with TensorFlow
- Estimating with decision tree regression
- Transitioning from classification trees to regression trees
- Implementing decision tree regression
- Implementing a regression forest
- Evaluating regression performance
- Predicting stock prices with the three regression algorithms
- Summary
- Exercises
- Chapter 6: Predicting Stock Prices with Artificial Neural Networks
- Demystifying neural networks
- Starting with a single-layer neural network
- Layers in neural networks
- Activation functions
- Backpropagation
- Adding more layers to a neural network: DL
- Building neural networks
- Implementing neural networks from scratch
- Implementing neural networks with scikit-learn
- Implementing neural networks with TensorFlow
- Implementing neural networks with PyTorch
- Picking the right activation functions
- Preventing overfitting in neural networks
- Dropout
- Early stopping
- Predicting stock prices with neural networks
- Training a simple neural network
- Fine-tuning the neural network
- Summary
- Exercises
- Chapter 7: Mining the 20 Newsgroups Dataset with Text Analysis Techniques
- How computers understand language - NLP
- What is NLP?
- The history of NLP
- NLP applications
- Touring popular NLP libraries and picking up NLP basics
- Installing famous NLP libraries
- Corpora
- Tokenization
- PoS tagging
- NER
- Stemming and lemmatization
- Semantics and topic modeling
- Getting the newsgroups data
- Exploring the newsgroups data
- Thinking about features for text data
- Counting the occurrence of each word token
- Text preprocessing
- Dropping stop words
- Reducing inflectional and derivational forms of words
- Visualizing the newsgroups data with t-SNE
- What is dimensionality reduction?
- t-SNE for dimensionality reduction
- Representing words with dense vectors - word embedding
- Building embedding models using shallow neural networks
- Utilizing pre-trained embedding models
- Summary
- Exercises
- Chapter 8: Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
- Learning without guidance - unsupervised learning
- Getting started with k-means clustering
- How does k-means clustering work?
- Implementing k-means from scratch
- Implementing k-means with scikit-learn
- Choosing the value of k
- Clustering newsgroups dataset
- Clustering newsgroups data using k-means
- Describing the clusters using GPT
- Discovering underlying topics in newsgroups
- Topic modeling using NMF
- Topic modeling using LDA
- Summary
- Exercises
- Chapter 9: Recognizing Faces with Support Vector Machine
- Finding the separating boundary with SVM
- Scenario 1 - identifying a separating hyperplane
- Scenario 2 - determining the optimal hyperplane
- Scenario 3 - handling outliers
- Implementing SVM
- Scenario 4 - dealing with more than two classes
- One-vs-rest
- One-vs-one
- Multiclass cases in scikit-learn
- Scenario 5 - solving linearly non-separable problems with kernels
- Choosing between linear and RBF kernels
- Classifying face images with SVM
- Exploring the face image dataset
- Building an SVM-based image classifier
- Boosting image classification performance with PCA
- Estimating with support vector regression
- Implementing SVR
- Summary
- Exercises
- Chapter 10: Machine Learning Best Practices
- Machine learning solution workflow
- Best practices in the data preparation stage
- Best practice 1 - Completely understanding the project goal
- Best practice 2 - Collecting all fields that are relevant
- Best practice 3 - Maintaining the consistency and normalization of field values
- Best practice 4 - Dealing with missing data
- Best practice 5 - Storing large-scale data
- Best practices in the training set generation stage
- Best practice 6 - Identifying categorical features with numerical values
- Best practice 7 - Deciding whether to encode categorical features
- Best practice 8 - Deciding whether to select features and, if so, how to do so
- Best practice 9 - Deciding whether to reduce dimensionality and, if so, how to do so
- Best practice 10 - Deciding whether to rescale features
- Best practice 11 - Performing feature engineering with domain expertise
- Best practice 12 - Performing feature engineering without domain expertise
- Binarization and discretization
- Interaction
- Polynomial transformation
- Best practice 13 - Documenting how each feature is generated
- Best practice 14 - Extracting features from text data
- tf and tf-idf
- Word embedding
- Word2Vec embedding
- Best practices in the model training, evaluation, and selection stage
- Best practice 15 - Choosing the right algorithm(s) to start with
- Naïve Bayes
- Logistic regression
- SVM
- Random forest (or decision tree)
- Neural networks
- Best practice 16 - Reducing overfitting
- Best practice 17 - Diagnosing overfitting and underfitting
- Best practice 18 - Modeling on large-scale datasets
- Best practices in the deployment and monitoring stage
- Best practice 19 - Saving, loading, and reusing models
- Saving and restoring models using pickle
- Saving and restoring models in TensorFlow
- Saving and restoring models in PyTorch
- Best practice 20 - Monitoring model performance
- Best practice 21 - Updating models regularly
- Summary
- Exercises
- Chapter 11: Categorizing Images of Clothing with Convolutional Neural Networks
- Getting started with CNN building blocks
- The convolutional layer
- The non-linear layer
- The pooling layer
- Architecting a CNN for classification
- Exploring the clothing image dataset
- Classifying clothing images with CNNs
- Architecting the CNN model
- Fitting the CNN model
- Visualizing the convolutional filters
- Boosting the CNN classifier with data augmentation
- Flipping for data augmentation
- Rotation for data augmentation
- Cropping for data augmentation
- Improving the clothing image classifier with data augmentation
- Advancing the CNN classifier with transfer learning
- Development of CNN architectures and pretrained models
- Improving the clothing image classifier by fine-tuning ResNets
- Summary
- Exercises
- Chapter 12: Making Predictions with Sequences Using Recurrent Neural Networks
- Introducing sequential learning
- Learning the RNN architecture by example
- Recurrent mechanism
- Many-to-one RNNs
- One-to-many RNNs
- Many-to-many (synced) RNNs
- Many-to-many (unsynced) RNNs
- Training an RNN model
- Overcoming long-term dependencies with LSTM
- Analyzing movie review sentiment with RNNs
- Analyzing and preprocessing the data
- Building a simple LSTM network
- Stacking multiple LSTM layers
- Revisiting stock price forecasting with LSTM
- Writing your own War and Peace with RNNs
- Acquiring and analyzing the training data
- Constructing the training set for the RNN text generator
- Building and training an RNN text generator
- Summary
- Exercises
- Chapter 13: Advancing Language Understanding and Generation with the Transformer Models
- Understanding self-attention
- Key, value, and query representations
- Attention score calculation and embedding vector generation
- Multi-head attention
- Exploring the Transformer's architecture
- The encoder-decoder structure
- Positional encoding
- Layer normalization
- Improving sentiment analysis with BERT and Transformers
- Pre-training BERT
- MLM
- NSP
- Fine-tuning of BERT
- Fine-tuning a pre-trained BERT model for sentiment analysis
- Using the Trainer API to train Transformer models
- Generating text using GPT
- Pre-training of GPT and autoregressive generation
- Writing your own version of War and Peace with GPT
- Summary
- Exercises
- Chapter 14: Building an Image Search Engine Using CLIP: a Multimodal Approach
- Introducing the CLIP model
- Understanding the mechanism of the CLIP model
- Vision encoder
- Text encoder
- Contrastive learning
- Exploring applications of the CLIP model
- Zero-shot image classification
- Zero-shot text classification
- Image and text retrieval
- Image and text generation
- Transfer learning
- Getting started with the dataset
- Obtaining the Flickr8k dataset
- Loading the Flickr8k dataset
- Architecting the CLIP model
- Vision encoder
- Text encoder
- Projection head for contrastive learning
- CLIP model
- Finding images with words
- Training a CLIP model
- Obtaining embeddings for images and text to identify matches
- Image search using the pre-trained CLIP model
- Zero-shot classification
- Summary
- Exercises
- References
- Chapter 15: Making Decisions in Complex Environments with Reinforcement Learning
- Setting up the working environment
- Introducing OpenAI Gym and Gymnasium
- Installing Gymnasium
- Introducing reinforcement learning with examples
- Elements of reinforcement learning
- Cumulative rewards
- Approaches to reinforcement learning
- Policy-based approach
- Value-based approach
- Solving the FrozenLake environment with dynamic programming
- Simulating the FrozenLake environment
- Solving FrozenLake with the value iteration algorithm
- Solving FrozenLake with the policy iteration algorithm
- Performing Monte Carlo learning
- Simulating the Blackjack environment
- Performing Monte Carlo policy evaluation
- Performing on-policy Monte Carlo control
- Solving the Blackjack problem with the Q-learning algorithm
- Introducing the Q-learning algorithm
- Developing the Q-learning algorithm
- Summary
- Exercises
- Other Books You May Enjoy
- Index
Preface
The fourth edition of Python Machine Learning by Example is a comprehensive guide for beginners, and experienced Machine Learning (ML) practitioners who want to learn more advanced techniques like multimodal modeling. This edition emphasizes best practices, providing invaluable insights for ML engineers, data scientists, and analysts.
Explore advanced techniques, including two new chapters on NLP transformers with BERT and GPT and multimodal computer vision models with PyTorch and Hugging Face. You'll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine.
This book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your ML expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide.
Who this book is for
If you're a machine learning enthusiast, data analyst, or data engineer who's highly passionate about machine learning and you want to begin working on ML assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.
What this book covers
Chapter 1, Getting Started with Machine Learning and Python, will kick off your Python machine learning journey. It starts with what machine learning is, why we need it, and its evolution over the last few decades. It then discusses typical machine learning tasks and explores several essential techniques of working with data and working with models, in a practical and fun way. You will also set up the software and tools needed for examples and projects in the upcoming chapters.
Chapter 2, Building a Movie Recommendation Engine with Naïve Bayes, focuses on classification, specifically binary classification and Naïve Bayes. The goal of the chapter is to build a movie recommendation system. You will learn the fundamental concepts of classification, and about Naïve Bayes, a simple yet powerful algorithm. It also demonstrates how to fine-tune a model, which is an important skill for every data science or machine learning practitioner to learn.
Chapter 3, Predicting Online Ad Click-Through with Tree-Based Algorithms, introduces and explains in depth tree-based algorithms (including decision trees, random forests, and boosted trees) throughout the course of solving the advertising click-through rate problem. You will explore decision trees from the root to the leaves, and work on implementations of tree models from scratch, using scikit-learn and XGBoost. Feature importance, feature selection, and ensemble will be covered alongside.
Chapter 4, Predicting Online Ad Click-Through with Logistic Regression, is a continuation of the ad click-through prediction project, with a focus on a very scalable classification model-logistic regression. You will explore how logistic regression works, and how to work with large datasets. The chapter also covers categorical variable encoding, L1 and L2 regularization, feature selection, online learning, and stochastic gradient descent.
Chapter 5, Predicting Stock Prices with Regression Algorithms, focuses on several popular regression algorithms, including linear regression, regression tree and regression forest. It will encourage you to utilize them to tackle a billion (or trillion) dollar problem-stock price prediction. You will practice solving regression problems using scikit-learn and TensorFlow.
Chapter 6, Predicting Stock Prices with Artificial Neural Networks, introduces and explains in depth neural network models. It covers the building blocks of neural networks, and important concepts such as activation functions, feedforward, and backpropagation. You will start by building the simplest neural network and go deeper by adding more layers to it. We will implement neural networks from scratch, use TensorFlow and PyTorch, and train a neural network to predict stock prices.
Chapter 7, Mining the 20 Newsgroups Dataset with Text Analysis Techniques, will start the second step of your learning journey-unsupervised learning. It explores a natural language processing problem-exploring newsgroups data. You will gain hands-on experience in working with text data, especially how to convert words and phrases into machine-readable values and how to clean up words with little meaning. You will also visualize text data using a dimension reduction technique called t-SNE. Finally, you will learn how to represent words with embedding vectors.
Chapter 8, Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling, talks about identifying different groups of observations from data in an unsupervised manner. You will cluster the newsgroups data using the K-means algorithm, and detect topics using non-negative matrix factorization and latent Dirichlet allocation. You will be amused by how many interesting themes you are able to mine from the 20 newsgroups dataset!
Chapter 9, Recognizing Faces with Support Vector Machine, continues the journey of supervised learning and classification. Specifically, it focuses on multiclass classification and support vector machine classifiers. It discusses how the support vector machine algorithm searches for a decision boundary in order to separate data from different classes. You will implement the algorithm with scikit-learn, and apply it to solve various real-life problems including face recognition.
Chapter 10, Machine Learning Best Practices, aims to fully prove your learning and get you ready for real-world projects. It includes 21 best practices to follow throughout the entire machine learning workflow.
Chapter 11, Categorizing Images of Clothing with Convolutional Neural Networks, is about using Convolutional Neural Networks (CNNs), a very powerful modern machine learning model, to classify images of clothing. It covers the building blocks and architecture of CNNs, and their implementation using PyTorch. After exploring the data of clothing images, you will develop CNN models to categorize the images into ten classes, and utilize data augmentation and transfer learning techniques to boost the classifier.
Chapter 12, Making Predictions with Sequences using Recurrent Neural Networks, starts by defining sequential learning, and exploring how Recurrent Neural Networks (RNNs) are well suited for it. You will learn about various types of RNNs and their common applications. You will implement RNNs with PyTorch, and apply them to solve three interesting sequential learning problems: sentiment analysis on IMDb movie reviews, stock price forecasting, and text auto-generation.
Chapter 13, Advancing Language Understanding and Generation with the Transformer Models, dives into the Transformer neural network, designed for sequential learning. It focuses on crucial parts of the input sequence and captures long-range relationships better than RNNs. You will explore two cutting-edge Transformer models BERT and GPT, and use them for sentiment analysis and text generation, which surpass the performance achieved in the previous chapter.
Chapter 14, Building an Image Search Engine Using CLIP: A Multimodal Approach, explores a multimodal model, CLIP, that merges visual and textual data. This powerful model can understand connections between images and text. You will dive into its architecture and how it learns, then build an image search engine. Finally, you will cap it all off with a zero-shot image classification project, pushing the boundaries of what this model can do.
Chapter 15, Making Decisions in Complex Environments with Reinforcement Learning, is about learning from experience, and interacting with the environment. After exploring the fundamentals of reinforcement learning, you will explore the FrozenLake environment with a simple dynamic programming algorithm. You will learn about Monte Carlo learning and use it for value approximation and control. You will also develop temporal difference algorithms and use Q-learning to solve the taxi problem.
To get the most out of this book
A basic foundation of Python knowledge, basic machine learning algorithms, and some basic Python libraries, such as NumPy and pandas, is assumed in order to create smart cognitive actions for your projects.
Download the example code files
The code bundle for the book is hosted on GitHub at https://github.com/packtjaniceg/Python-Machine-Learning-by-Example-Fourth-Edition/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781835085622.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter (X) handles. Here is an example:...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.