
Machine Learning for Imbalanced Data
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Learn cutting-edge deep learning techniques to overcome data imbalance
- Explore different methods for dealing with skewed data in ML and DL applications
- Purchase of the print or Kindle book includes a free eBook in the PDF format
Book DescriptionAs machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance. Machine Learning for Imbalanced Data begins by introducing you to the challenges posed by imbalanced datasets and the importance of addressing these issues. It then guides you through techniques that enhance the performance of classical machine learning models when using imbalanced data, including various sampling and cost-sensitive learning methods. As you progress, you'll delve into similar and more advanced techniques for deep learning models, employing PyTorch as the primary framework. Throughout the book, hands-on examples will provide working and reproducible code that'll demonstrate the practical implementation of each technique. By the end of this book, you'll be adept at identifying and addressing class imbalances and confidently applying various techniques, including sampling, cost-sensitive techniques, and threshold adjustment, while using traditional machine learning or deep learning models.What you will learn - Use imbalanced data in your machine learning models effectively
- Explore the metrics used when classes are imbalanced
- Understand how and when to apply various sampling methods such as over-sampling and under-sampling
- Apply data-based, algorithm-based, and hybrid approaches to deal with class imbalance
- Combine and choose from various options for data balancing while avoiding common pitfalls
- Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets
Who this book is forThis book is for machine learning practitioners who want to effectively address the challenges of imbalanced datasets in their projects. Data scientists, machine learning engineers/scientists, research scientists/engineers, and data scientists/engineers will find this book helpful. Though complete beginners are welcome to read this book, some familiarity with core machine learning concepts will help readers maximize the benefits and insights gained from this comprehensive resource.
All prices
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Chapter 1: Introduction to Data Imbalance in Machine Learning
- Technical requirements
- Introduction to imbalanced datasets
- Machine learning 101
- What happens during model training?
- Types of dataset and splits
- Cross-validation
- Common evaluation metrics
- Confusion matrix
- ROC
- Precision-Recall curve
- Relation between the ROC curve and PR curve
- Challenges and considerations when dealing with imbalanced data
- When can we have an imbalance in datasets?
- Why can imbalanced data be a challenge?
- When to not worry about data imbalance
- Introduction to the imbalanced-learn library
- General rules to follow
- Summary
- Questions
- References
- Chapter 2: Oversampling Methods
- Technical requirements
- What is oversampling?
- Random oversampling
- Problems with random oversampling
- SMOTE
- How SMOTE works
- Problems with SMOTE
- SMOTE variants
- Borderline-SMOTE
- ADASYN
- Working of ADASYN
- Categorical features and SMOTE variants (SMOTE-NC and SMOTEN)
- Model performance comparison of various oversampling methods
- Guidance for using various oversampling techniques
- When to avoid oversampling
- Oversampling in multi-class classification
- Summary
- Exercises
- References
- Chapter 3: Undersampling Methods
- Technical requirements
- Introducing undersampling
- When to avoid undersampling the majority class
- Fixed versus cleaning undersampling
- Undersampling approaches
- Removing examples uniformly
- Random UnderSampling
- ClusterCentroids
- Strategies for removing noisy observations
- ENN, RENN, and AllKNN
- Tomek links
- Neighborhood Cleaning Rule
- Instance hardness threshold
- Strategies for removing easy observations
- Condensed Nearest Neighbors
- One-sided selection
- Combining undersampling and oversampling
- Model performance comparison
- Summary
- Exercises
- References
- Chapter 4: Ensemble Methods
- Technical requirements
- Bagging techniques for imbalanced data
- UnderBagging
- OverBagging
- SMOTEBagging
- Comparative performance of bagging methods
- Boosting techniques for imbalanced data
- AdaBoost
- RUSBoost, SMOTEBoost, and RAMOBoost
- Ensemble of ensembles
- EasyEnsemble
- Comparative performance of boosting methods
- Model performance comparison
- Summary
- Questions
- References
- Chapter 5: Cost-Sensitive Learning
- Technical requirements
- The concept of Cost-Sensitive Learning
- Costs and cost functions
- Types of cost-sensitive learning
- Difference between CSL and resampling
- Problems with rebalancing techniques
- Understanding costs in practice
- Cost-Sensitive Learning for logistic regression
- Cost-Sensitive Learning for decision trees
- Cost-Sensitive Learning using scikit-learn and XGBoost models
- MetaCost - making any classification model cost-sensitive
- Threshold adjustment
- Methods for threshold tuning
- Summary
- Questions
- References
- Chapter 6: Data Imbalance in Deep Learning
- Technical requirements
- A brief introduction to deep learning
- Neural networks
- Perceptron
- Activation functions
- Layers
- Feedforward neural networks
- Training neural networks
- The effect of the learning rate on data imbalance
- Image processing using Convolutional Neural Networks
- Text analysis using Natural Language Processing
- Data imbalance in deep learning
- The impact of data imbalance on deep learning models
- Overview of deep learning techniques to handle data imbalance
- Multi-label classification
- Summary
- Questions
- References
- Chapter 7: Data-Level Deep Learning Methods
- Technical requirements
- Preparing the data
- Creating the training loop
- Sampling techniques for deep learning models
- Random oversampling
- Dynamic sampling
- Data augmentation techniques for vision
- Data-level techniques for text classification
- Dataset and baseline model
- Document-level augmentation
- Character and word-level augmentation
- Discussion of other data-level deep learning methods and their key ideas
- Two-phase learning
- Expansive Over-Sampling
- Using generative models for oversampling
- DeepSMOTE
- Neural style transfer
- Summary
- Questions
- References
- Chapter 8: Algorithm-Level Deep Learning Techniques
- Technical requirements
- Motivation for algorithm-level techniques
- Weighting techniques
- Using PyTorch's weight parameter
- Handling textual data
- Deferred re-weighting - a minor variant of the class weighting technique
- Explicit loss function modification
- Focal loss
- Class-balanced loss
- Class-dependent temperature Loss
- Class-wise difficulty-balanced loss
- Discussing other algorithm-based techniques
- Regularization techniques
- Siamese networks
- Deeper neural networks
- Threshold adjustment
- Summary
- Questions
- References
- Chapter 9: Hybrid Deep Learning Methods
- Technical requirements
- Using graph machine learning for imbalanced data
- Understanding graphs
- Graph machine learning
- Dealing with imbalanced data
- Case study - the performance of XGBoost, MLP, and a GCN on an imbalanced dataset
- Hard example mining
- Online Hard Example Mining
- Minority class incremental rectification
- Utilizing the hard sample mining technique in minority class incremental rectification
- Summary
- Questions
- References
- Chapter 10: Model Calibration
- Technical requirements
- Introduction to model calibration
- Why bother with model calibration
- Models with and without well-calibrated probabilities
- Calibration curves or reliability plot
- Brier score
- Expected Calibration Error
- The influence of data balancing techniques on model calibration
- Plotting calibration curves for a model trained on a real-world dataset
- Model calibration techniques
- The calibration of model scores to account for sampling
- Platt's scaling
- Isotonic regression
- Choosing between Platt's scaling and Isotonic regression
- Temperature scaling
- Label smoothing
- The impact of calibration on a model's performance
- Summary
- Questions
- References
- Appendix: Machine Learning Pipeline in Production
- Machine learning training pipeline
- Inferencing (online or batch)
- Assessments
- Chapter 1 - Introduction to Data Imbalance in Machine Learning
- Chapter 2 - Oversampling Methods
- Chapter 3 - Undersampling Methods
- Chapter 4 - Ensemble Methods
- Chapter 5 - Cost-Sensitive Learning
- Chapter 6 - Data Imbalance in Deep Learning
- Chapter 7 - Data-Level Deep Learning Methods
- Chapter 8 - Algorithm-Level Deep Learning Techniques
- Chapter 9 - Hybrid Deep Learning Methods
- Chapter 10 - Model Calibration
- Index
- Other Books You May Enjoy
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.