
Designing Machine Learning Systems
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they''re data dependent, with data varying wildly from one use case to the next. In this book, you''ll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.
Author Chip Huyen, co-founder of Claypot AI, considers each design decision--such as how to process and create training data, which features to use, how often to retrain models, and what to monitor--in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.
This book will help you tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing a monitoring system to quickly detect and address issues your models might encounter in production
- Architecting an ML platform that serves across use cases
- Developing responsible ML systems
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Table of Contents
- Preface
- Who This Book Is For
- What This Book Is Not
- Navigating This Book
- GitHub Repository and Community
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Chapter 1. Overview of Machine Learning Systems
- When to Use Machine Learning
- Machine Learning Use Cases
- Understanding Machine Learning Systems
- Machine Learning in Research Versus in Production
- Machine Learning Systems Versus Traditional Software
- Summary
- Chapter 2. Introduction to Machine Learning Systems Design
- Business and ML Objectives
- Requirements for ML Systems
- Reliability
- Scalability
- Maintainability
- Adaptability
- Iterative Process
- Framing ML Problems
- Types of ML Tasks
- Objective Functions
- Mind Versus Data
- Summary
- Chapter 3. Data Engineering Fundamentals
- Data Sources
- Data Formats
- JSON
- Row-Major Versus Column-Major Format
- Text Versus Binary Format
- Data Models
- Relational Model
- NoSQL
- Structured Versus Unstructured Data
- Data Storage Engines and Processing
- Transactional and Analytical Processing
- ETL: Extract, Transform, and Load
- Modes of Dataflow
- Data Passing Through Databases
- Data Passing Through Services
- Data Passing Through Real-Time Transport
- Batch Processing Versus Stream Processing
- Summary
- Chapter 4. Training Data
- Sampling
- Nonprobability Sampling
- Simple Random Sampling
- Stratified Sampling
- Weighted Sampling
- Reservoir Sampling
- Importance Sampling
- Labeling
- Hand Labels
- Natural Labels
- Handling the Lack of Labels
- Class Imbalance
- Challenges of Class Imbalance
- Handling Class Imbalance
- Data Augmentation
- Simple Label-Preserving Transformations
- Perturbation
- Data Synthesis
- Summary
- Chapter 5. Feature Engineering
- Learned Features Versus Engineered Features
- Common Feature Engineering Operations
- Handling Missing Values
- Scaling
- Discretization
- Encoding Categorical Features
- Feature Crossing
- Discrete and Continuous Positional Embeddings
- Data Leakage
- Common Causes for Data Leakage
- Detecting Data Leakage
- Engineering Good Features
- Feature Importance
- Feature Generalization
- Summary
- Chapter 6. Model Development and Offline Evaluation
- Model Development and Training
- Evaluating ML Models
- Ensembles
- Experiment Tracking and Versioning
- Distributed Training
- AutoML
- Model Offline Evaluation
- Baselines
- Evaluation Methods
- Summary
- Chapter 7. Model Deployment and Prediction Service
- Machine Learning Deployment Myths
- Myth 1: You Only Deploy One or Two ML Models at a Time
- Myth 2: If We Don't Do Anything, Model Performance Remains the Same
- Myth 3: You Won't Need to Update Your Models as Much
- Myth 4: Most ML Engineers Don't Need to Worry About Scale
- Batch Prediction Versus Online Prediction
- From Batch Prediction to Online Prediction
- Unifying Batch Pipeline and Streaming Pipeline
- Model Compression
- Low-Rank Factorization
- Knowledge Distillation
- Pruning
- Quantization
- ML on the Cloud and on the Edge
- Compiling and Optimizing Models for Edge Devices
- ML in Browsers
- Summary
- Chapter 8. Data Distribution Shifts and Monitoring
- Causes of ML System Failures
- Software System Failures
- ML-Specific Failures
- Data Distribution Shifts
- Types of Data Distribution Shifts
- General Data Distribution Shifts
- Detecting Data Distribution Shifts
- Addressing Data Distribution Shifts
- Monitoring and Observability
- ML-Specific Metrics
- Monitoring Toolbox
- Observability
- Summary
- Chapter 9. Continual Learning and Test in Production
- Continual Learning
- Stateless Retraining Versus Stateful Training
- Why Continual Learning?
- Continual Learning Challenges
- Four Stages of Continual Learning
- How Often to Update Your Models
- Test in Production
- Shadow Deployment
- A/B Testing
- Canary Release
- Interleaving Experiments
- Bandits
- Summary
- Chapter 10. Infrastructure and Tooling for MLOps
- Storage and Compute
- Public Cloud Versus Private Data Centers
- Development Environment
- Dev Environment Setup
- Standardizing Dev Environments
- From Dev to Prod: Containers
- Resource Management
- Cron, Schedulers, and Orchestrators
- Data Science Workflow Management
- ML Platform
- Model Deployment
- Model Store
- Feature Store
- Build Versus Buy
- Summary
- Chapter 11. The Human Side of Machine Learning
- User Experience
- Ensuring User Experience Consistency
- Combatting "Mostly Correct" Predictions
- Smooth Failing
- Team Structure
- Cross-functional Teams Collaboration
- End-to-End Data Scientists
- Responsible AI
- Irresponsible AI: Case Studies
- A Framework for Responsible AI
- Summary
- Epilogue
- Index
- About the Author
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.