Python Machine Learning By Example

Name: Python Machine Learning By Example | Unlock machine learning best practices with real-world use cases
Brand: Packt Publishing Limited
Price: 33.59 EUR
Availability: OnlineOnly

Unlock machine learning best practices with real-world use cases

Yuxi (Hayden) Liu Liu, Yuxi (Hayden)(Author)

Packt Publishing Limited

4th Edition

Published on 31. July 2024

526 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-83508-222-5 (ISBN)

from €33.59

Available for download

Watchlist: see prices

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Author Yuxi (Hayden) Liu teaches machine learning from the fundamentals to building NLP transformers and multimodal models with best practice tips and real-world examples using PyTorch, TensorFlow, scikit-learn, and pandas. Free with your book: DRM-free PDF version + access to Packt's next-gen Reader*Key Features - Discover new and updated content on NLP transformers, PyTorch, and computer vision modeling
- Includes a dedicated chapter on best practices and additional best practice tips throughout the book to improve your ML solutions
- Implement ML models, such as neural networks and linear and logistic regression, from scratch
Book DescriptionThe fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engineer Yuxi (Hayden) Liu, this edition emphasizes best practices, providing invaluable insights for machine learning engineers, data scientists, and analysts. Explore advanced techniques, including two new chapters on natural language processing transformers with BERT and GPT, and multimodal computer vision models with PyTorch and Hugging Face. You'll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine. This hands-on machine learning book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your machine learning and deep learning expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide. *Email sign-up and proof of purchase requiredWhat you will learn - Follow machine learning best practices throughout data preparation and model development
- Build and improve image classifiers using convolutional neural networks (CNNs) and transfer learning
- Develop and fine-tune neural networks using TensorFlow and PyTorch
- Analyze sequence data and make predictions using recurrent neural networks (RNNs), transformers, and CLIP
- Build classifiers using support vector machines (SVMs) and boost performance with PCA
- Avoid overfitting using regularization, feature selection, and more
Who this book is forThis expanded fourth edition is ideal for data scientists, ML engineers, analysts, and students with Python programming knowledge. The real-world examples, best practices, and code prepare anyone undertaking their first serious ML project.

All prices

More details

Other editions

Content

Cover
Title Page
Copyright
Contributors
Table of Contents
Preface
Making the Most Out of This Book - Get to Know Your Free Benefits
Chapter 1: Getting Started with Machine Learning and Python
An introduction to machine learning
Understanding why we need machine learning
Differentiating between machine learning and automation
Machine learning applications
Knowing the prerequisites
Getting started with three types of machine learning
A brief history of the development of machine learning algorithms
Digging into the core of machine learning
Generalizing with data
Overfitting, underfitting, and the bias-variance trade-off
Overfitting
Underfitting
The bias-variance trade-off
Avoiding overfitting with cross-validation
Avoiding overfitting with regularization
Avoiding overfitting with feature selection and dimensionality reduction
Data preprocessing and feature engineering
Preprocessing and exploration
Dealing with missing values
Label encoding
One-hot encoding
Dense embedding
Scaling
Feature engineering
Polynomial transformation
Binning
Combining models
Voting and averaging
Bagging
Boosting
Stacking
Installing software and setting up
Setting up Python and environments
Installing the main Python packages
NumPy
SciPy
pandas
scikit-learn
TensorFlow
PyTorch
Summary
Exercises
Chapter 2: Building a Movie Recommendation Engine with Naïve Bayes
Getting started with classification
Binary classification
Multiclass classification
Multi-label classification
Exploring Naïve Bayes
Bayes' theorem by example
The mechanics of Naïve Bayes
Implementing Naïve Bayes
Implementing Naïve Bayes from scratch
Implementing Naïve Bayes with scikit-learn
Building a movie recommender with Naïve Bayes
Preparing the data
Training a Naïve Bayes model
Evaluating classification performance
Tuning models with cross-validation
Summary
Exercises
References
Chapter 3: Predicting Online Ad Click-Through with Tree-Based Algorithms
A brief overview of ad click-through prediction
Getting started with two types of data - numerical and categorical
Exploring a decision tree from the root to the leaves
Constructing a decision tree
The metrics for measuring a split
Gini Impurity
Information Gain
Implementing a decision tree from scratch
Implementing a decision tree with scikit-learn
Predicting ad click-through with a decision tree
Ensembling decision trees - random forests
Ensembling decision trees - gradient-boosted trees
Summary
Exercises
Chapter 4: Predicting Online Ad Click-Through with Logistic Regression
Converting categorical features to numerical - one-hot encoding and ordinal encoding
Classifying data with logistic regression
Getting started with the logistic function
Jumping from the logistic function to logistic regression
Training a logistic regression model
Training a logistic regression model using gradient descent
Predicting ad click-through with logistic regression using gradient descent
Training a logistic regression model using stochastic gradient descent (SGD)
Training a logistic regression model with regularization
Feature selection using L1 regularization
Feature selection using random forest
Training on large datasets with online learning
Handling multiclass classification
Implementing logistic regression using TensorFlow
Summary
Exercises
Chapter 5: Predicting Stock Prices with Regression Algorithms
What is regression?
Mining stock price data
A brief overview of the stock market and stock prices
Getting started with feature engineering
Acquiring data and generating features
Estimating with linear regression
How does linear regression work?
Implementing linear regression from scratch
Implementing linear regression with scikit-learn
Implementing linear regression with TensorFlow
Estimating with decision tree regression
Transitioning from classification trees to regression trees
Implementing decision tree regression
Implementing a regression forest
Evaluating regression performance
Predicting stock prices with the three regression algorithms
Summary
Exercises
Chapter 6: Predicting Stock Prices with Artificial Neural Networks
Demystifying neural networks
Starting with a single-layer neural network
Layers in neural networks
Activation functions
Backpropagation
Adding more layers to a neural network: DL
Building neural networks
Implementing neural networks from scratch
Implementing neural networks with scikit-learn
Implementing neural networks with TensorFlow
Implementing neural networks with PyTorch
Picking the right activation functions
Preventing overfitting in neural networks
Dropout
Early stopping
Predicting stock prices with neural networks
Training a simple neural network
Fine-tuning the neural network
Summary
Exercises
Chapter 7: Mining the 20 Newsgroups Dataset with Text Analysis Techniques
How computers understand language - NLP
What is NLP?
The history of NLP
NLP applications
Touring popular NLP libraries and picking up NLP basics
Installing famous NLP libraries
Corpora
Tokenization
PoS tagging
NER
Stemming and lemmatization
Semantics and topic modeling
Getting the newsgroups data
Exploring the newsgroups data
Thinking about features for text data
Counting the occurrence of each word token
Text preprocessing
Dropping stop words
Reducing inflectional and derivational forms of words
Visualizing the newsgroups data with t-SNE
What is dimensionality reduction?
t-SNE for dimensionality reduction
Representing words with dense vectors - word embedding
Building embedding models using shallow neural networks
Utilizing pre-trained embedding models
Summary
Exercises
Chapter 8: Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
Learning without guidance - unsupervised learning
Getting started with k-means clustering
How does k-means clustering work?
Implementing k-means from scratch
Implementing k-means with scikit-learn
Choosing the value of k
Clustering newsgroups dataset
Clustering newsgroups data using k-means
Describing the clusters using GPT
Discovering underlying topics in newsgroups
Topic modeling using NMF
Topic modeling using LDA
Summary
Exercises
Chapter 9: Recognizing Faces with Support Vector Machine
Finding the separating boundary with SVM
Scenario 1 - identifying a separating hyperplane
Scenario 2 - determining the optimal hyperplane
Scenario 3 - handling outliers
Implementing SVM
Scenario 4 - dealing with more than two classes
One-vs-rest
One-vs-one
Multiclass cases in scikit-learn
Scenario 5 - solving linearly non-separable problems with kernels
Choosing between linear and RBF kernels
Classifying face images with SVM
Exploring the face image dataset
Building an SVM-based image classifier
Boosting image classification performance with PCA
Estimating with support vector regression
Implementing SVR
Summary
Exercises
Chapter 10: Machine Learning Best Practices
Machine learning solution workflow
Best practices in the data preparation stage
Best practice 1 - Completely understanding the project goal
Best practice 2 - Collecting all fields that are relevant
Best practice 3 - Maintaining the consistency and normalization of field values
Best practice 4 - Dealing with missing data
Best practice 5 - Storing large-scale data
Best practices in the training set generation stage
Best practice 6 - Identifying categorical features with numerical values
Best practice 7 - Deciding whether to encode categorical features
Best practice 8 - Deciding whether to select features and, if so, how to do so
Best practice 9 - Deciding whether to reduce dimensionality and, if so, how to do so
Best practice 10 - Deciding whether to rescale features
Best practice 11 - Performing feature engineering with domain expertise
Best practice 12 - Performing feature engineering without domain expertise
Binarization and discretization
Interaction
Polynomial transformation
Best practice 13 - Documenting how each feature is generated
Best practice 14 - Extracting features from text data
tf and tf-idf
Word embedding
Word2Vec embedding
Best practices in the model training, evaluation, and selection stage
Best practice 15 - Choosing the right algorithm(s) to start with
Naïve Bayes
Logistic regression
SVM
Random forest (or decision tree)
Neural networks
Best practice 16 - Reducing overfitting
Best practice 17 - Diagnosing overfitting and underfitting
Best practice 18 - Modeling on large-scale datasets
Best practices in the deployment and monitoring stage
Best practice 19 - Saving, loading, and reusing models
Saving and restoring models using pickle
Saving and restoring models in TensorFlow
Saving and restoring models in PyTorch
Best practice 20 - Monitoring model performance
Best practice 21 - Updating models regularly
Summary
Exercises
Chapter 11: Categorizing Images of Clothing with Convolutional Neural Networks
Getting started with CNN building blocks
The convolutional layer
The non-linear layer
The pooling layer
Architecting a CNN for classification
Exploring the clothing image dataset
Classifying clothing images with CNNs
Architecting the CNN model
Fitting the CNN model
Visualizing the convolutional filters
Boosting the CNN classifier with data augmentation
Flipping for data augmentation
Rotation for data augmentation
Cropping for data augmentation
Improving the clothing image classifier with data augmentation
Advancing the CNN classifier with transfer learning
Development of CNN architectures and pretrained models
Improving the clothing image classifier by fine-tuning ResNets
Summary
Exercises
Chapter 12: Making Predictions with Sequences Using Recurrent Neural Networks
Introducing sequential learning
Learning the RNN architecture by example
Recurrent mechanism
Many-to-one RNNs
One-to-many RNNs
Many-to-many (synced) RNNs
Many-to-many (unsynced) RNNs
Training an RNN model
Overcoming long-term dependencies with LSTM
Analyzing movie review sentiment with RNNs
Analyzing and preprocessing the data
Building a simple LSTM network
Stacking multiple LSTM layers
Revisiting stock price forecasting with LSTM
Writing your own War and Peace with RNNs
Acquiring and analyzing the training data
Constructing the training set for the RNN text generator
Building and training an RNN text generator
Summary
Exercises
Chapter 13: Advancing Language Understanding and Generation with the Transformer Models
Understanding self-attention
Key, value, and query representations
Attention score calculation and embedding vector generation
Multi-head attention
Exploring the Transformer's architecture
The encoder-decoder structure
Positional encoding
Layer normalization
Improving sentiment analysis with BERT and Transformers
Pre-training BERT
MLM
NSP
Fine-tuning of BERT
Fine-tuning a pre-trained BERT model for sentiment analysis
Using the Trainer API to train Transformer models
Generating text using GPT
Pre-training of GPT and autoregressive generation
Writing your own version of War and Peace with GPT
Summary
Exercises
Chapter 14: Building an Image Search Engine Using CLIP: a Multimodal Approach
Introducing the CLIP model
Understanding the mechanism of the CLIP model
Vision encoder
Text encoder
Contrastive learning
Exploring applications of the CLIP model
Zero-shot image classification
Zero-shot text classification
Image and text retrieval
Image and text generation
Transfer learning
Getting started with the dataset
Obtaining the Flickr8k dataset
Loading the Flickr8k dataset
Architecting the CLIP model
Vision encoder
Text encoder
Projection head for contrastive learning
CLIP model
Finding images with words
Training a CLIP model
Obtaining embeddings for images and text to identify matches
Image search using the pre-trained CLIP model
Zero-shot classification
Summary
Exercises
References
Chapter 15: Making Decisions in Complex Environments with Reinforcement Learning
Setting up the working environment
Introducing OpenAI Gym and Gymnasium
Installing Gymnasium
Introducing reinforcement learning with examples
Elements of reinforcement learning
Cumulative rewards
Approaches to reinforcement learning
Policy-based approach
Value-based approach
Solving the FrozenLake environment with dynamic programming
Simulating the FrozenLake environment
Solving FrozenLake with the value iteration algorithm
Solving FrozenLake with the policy iteration algorithm
Performing Monte Carlo learning
Simulating the Blackjack environment
Performing Monte Carlo policy evaluation
Performing on-policy Monte Carlo control
Solving the Blackjack problem with the Q-learning algorithm
Introducing the Q-learning algorithm
Developing the Q-learning algorithm
Summary
Exercises
Other Books You May Enjoy
Index

Preface

The fourth edition of Python Machine Learning by Example is a comprehensive guide for beginners, and experienced Machine Learning (ML) practitioners who want to learn more advanced techniques like multimodal modeling. This edition emphasizes best practices, providing invaluable insights for ML engineers, data scientists, and analysts.

Explore advanced techniques, including two new chapters on NLP transformers with BERT and GPT and multimodal computer vision models with PyTorch and Hugging Face. You'll learn key modeling techniques using practical examples, such as predicting stock prices and creating an image search engine.

This book navigates through complex challenges, bridging the gap between theoretical understanding and practical application. Elevate your ML expertise, tackle intricate problems, and unlock the potential of advanced techniques in machine learning with this authoritative guide.

Who this book is for

If you're a machine learning enthusiast, data analyst, or data engineer who's highly passionate about machine learning and you want to begin working on ML assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial, although this is not necessary.

What this book covers

Chapter 1, Getting Started with Machine Learning and Python, will kick off your Python machine learning journey. It starts with what machine learning is, why we need it, and its evolution over the last few decades. It then discusses typical machine learning tasks and explores several essential techniques of working with data and working with models, in a practical and fun way. You will also set up the software and tools needed for examples and projects in the upcoming chapters.

Chapter 2, Building a Movie Recommendation Engine with Naïve Bayes, focuses on classification, specifically binary classification and Naïve Bayes. The goal of the chapter is to build a movie recommendation system. You will learn the fundamental concepts of classification, and about Naïve Bayes, a simple yet powerful algorithm. It also demonstrates how to fine-tune a model, which is an important skill for every data science or machine learning practitioner to learn.

Chapter 3, Predicting Online Ad Click-Through with Tree-Based Algorithms, introduces and explains in depth tree-based algorithms (including decision trees, random forests, and boosted trees) throughout the course of solving the advertising click-through rate problem. You will explore decision trees from the root to the leaves, and work on implementations of tree models from scratch, using scikit-learn and XGBoost. Feature importance, feature selection, and ensemble will be covered alongside.

Chapter 4, Predicting Online Ad Click-Through with Logistic Regression, is a continuation of the ad click-through prediction project, with a focus on a very scalable classification model-logistic regression. You will explore how logistic regression works, and how to work with large datasets. The chapter also covers categorical variable encoding, L1 and L2 regularization, feature selection, online learning, and stochastic gradient descent.

Chapter 5, Predicting Stock Prices with Regression Algorithms, focuses on several popular regression algorithms, including linear regression, regression tree and regression forest. It will encourage you to utilize them to tackle a billion (or trillion) dollar problem-stock price prediction. You will practice solving regression problems using scikit-learn and TensorFlow.

Chapter 6, Predicting Stock Prices with Artificial Neural Networks, introduces and explains in depth neural network models. It covers the building blocks of neural networks, and important concepts such as activation functions, feedforward, and backpropagation. You will start by building the simplest neural network and go deeper by adding more layers to it. We will implement neural networks from scratch, use TensorFlow and PyTorch, and train a neural network to predict stock prices.

Chapter 7, Mining the 20 Newsgroups Dataset with Text Analysis Techniques, will start the second step of your learning journey-unsupervised learning. It explores a natural language processing problem-exploring newsgroups data. You will gain hands-on experience in working with text data, especially how to convert words and phrases into machine-readable values and how to clean up words with little meaning. You will also visualize text data using a dimension reduction technique called t-SNE. Finally, you will learn how to represent words with embedding vectors.

Chapter 8, Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling, talks about identifying different groups of observations from data in an unsupervised manner. You will cluster the newsgroups data using the K-means algorithm, and detect topics using non-negative matrix factorization and latent Dirichlet allocation. You will be amused by how many interesting themes you are able to mine from the 20 newsgroups dataset!

Chapter 9, Recognizing Faces with Support Vector Machine, continues the journey of supervised learning and classification. Specifically, it focuses on multiclass classification and support vector machine classifiers. It discusses how the support vector machine algorithm searches for a decision boundary in order to separate data from different classes. You will implement the algorithm with scikit-learn, and apply it to solve various real-life problems including face recognition.

Chapter 10, Machine Learning Best Practices, aims to fully prove your learning and get you ready for real-world projects. It includes 21 best practices to follow throughout the entire machine learning workflow.

Chapter 11, Categorizing Images of Clothing with Convolutional Neural Networks, is about using Convolutional Neural Networks (CNNs), a very powerful modern machine learning model, to classify images of clothing. It covers the building blocks and architecture of CNNs, and their implementation using PyTorch. After exploring the data of clothing images, you will develop CNN models to categorize the images into ten classes, and utilize data augmentation and transfer learning techniques to boost the classifier.

Chapter 12, Making Predictions with Sequences using Recurrent Neural Networks, starts by defining sequential learning, and exploring how Recurrent Neural Networks (RNNs) are well suited for it. You will learn about various types of RNNs and their common applications. You will implement RNNs with PyTorch, and apply them to solve three interesting sequential learning problems: sentiment analysis on IMDb movie reviews, stock price forecasting, and text auto-generation.

Chapter 13, Advancing Language Understanding and Generation with the Transformer Models, dives into the Transformer neural network, designed for sequential learning. It focuses on crucial parts of the input sequence and captures long-range relationships better than RNNs. You will explore two cutting-edge Transformer models BERT and GPT, and use them for sentiment analysis and text generation, which surpass the performance achieved in the previous chapter.

Chapter 14, Building an Image Search Engine Using CLIP: A Multimodal Approach, explores a multimodal model, CLIP, that merges visual and textual data. This powerful model can understand connections between images and text. You will dive into its architecture and how it learns, then build an image search engine. Finally, you will cap it all off with a zero-shot image classification project, pushing the boundaries of what this model can do.

Chapter 15, Making Decisions in Complex Environments with Reinforcement Learning, is about learning from experience, and interacting with the environment. After exploring the fundamentals of reinforcement learning, you will explore the FrozenLake environment with a simple dynamic programming algorithm. You will learn about Monte Carlo learning and use it for value approximation and control. You will also develop temporal difference algorithms and use Q-learning to solve the taxi problem.

To get the most out of this book

A basic foundation of Python knowledge, basic machine learning algorithms, and some basic Python libraries, such as NumPy and pandas, is assumed in order to create smart cognitive actions for your projects.

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/packtjaniceg/Python-Machine-Learning-by-Example-Fourth-Edition/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781835085622.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter (X) handles. Here is an example:...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Python Machine Learning By Example

Description

All prices

More details

Other editions

Additional editions

Previous edition

Content

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

System requirements