Designing Machine Learning Systems

Name: Designing Machine Learning Systems
Brand: O'Reilly
Price: 50.49 EUR
Availability: OnlineOnly

Chip Huyen(Author)

O'Reilly (Publisher)

Published on 17. May 2022

388 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-0981-0793-2 (ISBN)

€50.49incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Cover
Copyright
Table of Contents
Preface
Who This Book Is For
What This Book Is Not
Navigating This Book
GitHub Repository and Community
Conventions Used in This Book
Using Code Examples
O'Reilly Online Learning
How to Contact Us
Acknowledgments
Chapter 1. Overview of Machine Learning Systems
When to Use Machine Learning
Machine Learning Use Cases
Understanding Machine Learning Systems
Machine Learning in Research Versus in Production
Machine Learning Systems Versus Traditional Software
Summary
Chapter 2. Introduction to Machine Learning Systems Design
Business and ML Objectives
Requirements for ML Systems
Reliability
Scalability
Maintainability
Adaptability
Iterative Process
Framing ML Problems
Types of ML Tasks
Objective Functions
Mind Versus Data
Summary
Chapter 3. Data Engineering Fundamentals
Data Sources
Data Formats
JSON
Row-Major Versus Column-Major Format
Text Versus Binary Format
Data Models
Relational Model
NoSQL
Structured Versus Unstructured Data
Data Storage Engines and Processing
Transactional and Analytical Processing
ETL: Extract, Transform, and Load
Modes of Dataflow
Data Passing Through Databases
Data Passing Through Services
Data Passing Through Real-Time Transport
Batch Processing Versus Stream Processing
Summary
Chapter 4. Training Data
Sampling
Nonprobability Sampling
Simple Random Sampling
Stratified Sampling
Weighted Sampling
Reservoir Sampling
Importance Sampling
Labeling
Hand Labels
Natural Labels
Handling the Lack of Labels
Class Imbalance
Challenges of Class Imbalance
Handling Class Imbalance
Data Augmentation
Simple Label-Preserving Transformations
Perturbation
Data Synthesis
Summary
Chapter 5. Feature Engineering
Learned Features Versus Engineered Features
Common Feature Engineering Operations
Handling Missing Values
Scaling
Discretization
Encoding Categorical Features
Feature Crossing
Discrete and Continuous Positional Embeddings
Data Leakage
Common Causes for Data Leakage
Detecting Data Leakage
Engineering Good Features
Feature Importance
Feature Generalization
Summary
Chapter 6. Model Development and Offline Evaluation
Model Development and Training
Evaluating ML Models
Ensembles
Experiment Tracking and Versioning
Distributed Training
AutoML
Model Offline Evaluation
Baselines
Evaluation Methods
Summary
Chapter 7. Model Deployment and Prediction Service
Machine Learning Deployment Myths
Myth 1: You Only Deploy One or Two ML Models at a Time
Myth 2: If We Don't Do Anything, Model Performance Remains the Same
Myth 3: You Won't Need to Update Your Models as Much
Myth 4: Most ML Engineers Don't Need to Worry About Scale
Batch Prediction Versus Online Prediction
From Batch Prediction to Online Prediction
Unifying Batch Pipeline and Streaming Pipeline
Model Compression
Low-Rank Factorization
Knowledge Distillation
Pruning
Quantization
ML on the Cloud and on the Edge
Compiling and Optimizing Models for Edge Devices
ML in Browsers
Summary
Chapter 8. Data Distribution Shifts and Monitoring
Causes of ML System Failures
Software System Failures
ML-Specific Failures
Data Distribution Shifts
Types of Data Distribution Shifts
General Data Distribution Shifts
Detecting Data Distribution Shifts
Addressing Data Distribution Shifts
Monitoring and Observability
ML-Specific Metrics
Monitoring Toolbox
Observability
Summary
Chapter 9. Continual Learning and Test in Production
Continual Learning
Stateless Retraining Versus Stateful Training
Why Continual Learning?
Continual Learning Challenges
Four Stages of Continual Learning
How Often to Update Your Models
Test in Production
Shadow Deployment
A/B Testing
Canary Release
Interleaving Experiments
Bandits
Summary
Chapter 10. Infrastructure and Tooling for MLOps
Storage and Compute
Public Cloud Versus Private Data Centers
Development Environment
Dev Environment Setup
Standardizing Dev Environments
From Dev to Prod: Containers
Resource Management
Cron, Schedulers, and Orchestrators
Data Science Workflow Management
ML Platform
Model Deployment
Model Store
Feature Store
Build Versus Buy
Summary
Chapter 11. The Human Side of Machine Learning
User Experience
Ensuring User Experience Consistency
Combatting "Mostly Correct" Predictions
Smooth Failing
Team Structure
Cross-functional Teams Collaboration
End-to-End Data Scientists
Responsible AI
Irresponsible AI: Case Studies
A Framework for Responsible AI
Summary
Epilogue
Index
About the Author
Colophon

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Designing Machine Learning Systems

Description

More details

Other editions

Additional editions

Content

System requirements