Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure

Name: Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure | A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure
Brand: Wiley
Price: 25.99 EUR
Availability: OnlineOnly

A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure

Kristen Kehrer Caleb Kaiser(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 29. Juli 2024

249 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-394-24964-0 (ISBN)

25,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Introduction ix

1 A Gentle Introduction to Modern Machine Learning 1

Data Science Is Diverging from Business Intelligence 3

From CRISP-DM to Modern, Multicomponent ml Systems 4

The Emergence of LLMs Has Increased ML's Power and Complexity 7

What You Can Expect from This Book 9

2 An End-to-End Approach 11

Components of a YouTube Search Agent 13

Principles of a Production Machine Learning System 16

Observability 19

Reproducibility 19

Interoperability 20

Scalability 21

Improvability 22

A Note on Tools 23

3 A Data-Centric View 25

The Emergence of Foundation Models 25

The Role of Off-the-Shelf Components 27

The Data-Driven Approach 28

A Note on Data Ethics 28

Building the Dataset 30

Working with Vector Databases 34

Data Versioning and Management 50

Getting Started with Data Versioning 53

Knowing "Just Enough" Engineering 57

4 Standing Up Your LLM 61

Selecting Your LLM 61

What Type of Inference Do I Need to Perform? 65

How Open-Ended Is This Task? 66

What Are the Privacy Concerns for This Data? 66

How Much Will This Model Cost? 67

Experiment Management with LLMs 68

LLM Inference 74

Basics of Prompt Engineering 74

In-Context Learning 77

Intermediary Computation 85

Augmented Generation 89

Agentic Techniques 94

Optimizing LLM Inference with Experiment Management 102

Fine-Tuning LLMs 111

When to Fine-Tune an LLM 112

Quantization, QLOrA, and Parameter Efficient Fine-Tuning 113

Wrapping Things Up 121

5 Putting Together an Application 123

Prototyping with Gradio 125

Creating Graphics with Plotnine 128

Adding the Author Selector 137

Adding a Logo 138

Adding a Tab 139

Adding a Title and Subtitle 140

Changing the Color of the Buttons 140

Click to Download Button 141

Putting It All Together 141

Deploying Models as APIs 144

Implementing an API with FastAPI 146

Implementing Uvicorn 148

Monitoring an LLM 149

Dockerizing Your Service 151

Deploying Your Own LLM 154

Wrapping Things Up 159

6 Rounding Out the ML Life Cycle 161

Deploying a Simple Random Forest Model 161

An Introduction to Model Monitoring 167

Model Monitoring with Evidently AI 175

Building a Model Monitoring System 176

Final Thoughts on Monitoring 187

7 Review of Best Practices 189

Step 1: Understand the Problem 189

Step 2: Model Selection and Training 190

Step 3: Deploy and Maintain 192

Step 4: Collaborate and Communicate 196

Emerging Trends in LLMs 197

Next Steps in Learning 199

Appendix: Additional LLM Example 201

Index 209

Chapter 2
An End-to-End Approach

The focus of this book is on building end-to-end, production machine learning systems. With that in mind, we should begin by defining what these terms mean. We promise-this isn't just pedantry. Over the last 20 years, terms like end-to-end and production have been thrown around a lot in the world of data science, and depending on the time period, their definitions may vary wildly.

Imagine working on a business intelligence team at a shoe retailer in 2015, when working with data looked quite different than it does today. Your team is tasked with sales forecasting for the next quarter. What would your end-to-end system look like?

You begin by building your dataset. Being 2015, it's very likely that your company's data is a nightmare to access and ingest, but after considerable effort, your team is able to curate a clean dataset. Next, you focus on modeling. Your team will probably experiment with a variety of models, from ARIMA to random forests and maybe even gradient boosting (XGBoost embeddings was initially released in 2014, after all). After much tweaking and tuning, and ideally some robust validation, you finally have your model. Now, you can get to the business of predicting next quarter's sales and sharing your results. Sharing your results could mean many things here. You may have produced a dashboard for the chief revenue officer (CRO) or manually generated predictions each day using a tool like Statistical Package for Social Sciences (SPSS). Maybe you scheduled a job to run every day that would create the new day's actuals via a macro. Or you might start your day by inspecting the forecast and writing an email to share the results.

The hypothetical sales forecasting project is, in many ways, straightforward. This is not to say that it is easy. Any project that requires this much manual effort is difficult. Curating a dataset from years of unhygienic legacy data is arduous. Presenting your forecasts to a nontechnical audience without boring them is an art. There are a seemingly infinite number of experiments you might run in the modeling phase, and conducting a reliable validation process is rarely simple. But, if we break down the components of this project, there aren't that many architectural decisions to make.

Data ingestion: How your company has stored its data will ultimately guide this, but you will need to decide how you are going to ingest data and produce a dataset.
ML framework: In all likelihood, you will be using something like scikit-learn (a Python library for modeling) to build your model, but because it's 2015, you could also be using a statistical tool or some internal framework your team built.
Visualization library: If your company uses a particular dashboarding solution, you'll use that. Otherwise, you'll use whatever library you like or Excel to generate charts for your report.

And that's basically it. You don't need to make more architectural decisions. Because it's 2015, you probably aren't using any real experiment management solution outside of a spreadsheet. Data versioning isn't likely to be done in any official sort of way. Your model doesn't need to be "deployed" in any sense, although if you're using a macro, that might technically qualify. You will probably generate predictions by running a notebook or a local script, which might be stored with some kind of version control. But generally speaking, this is all that is encompassed by your end-to-end system, and it works-at least, for this particular system.

But what about a more complex machine learning system, like the YouTube search assistant we mentioned in Chapter 1? This system requires multiple models interacting in a pipeline. It involves a database with support for vectors to store embeddings.

A vector database is a database built for storing and querying high-dimensional data. Many popular techniques like RAG (Retrieval Augmented Generation) rely on manipulating text embeddings, which are high dimensional vectors, stored in vector databases.

The application must implement some retrieval logic to get relevant videos and excerpts, a way to generate transcripts from videos, and create embeddings for that text. Your inference pipeline needs to be accessible for real-time generation, and your front end can't simply be a chart in some report-you need a full-blown application. And, of course, your system needs to be able to scale to handle many concurrent users.

Beyond any individual technical difference, the most important difference to note is that this system is not a one-off bit of data analysis. It is an ongoing software project, one that needs to be maintained, monitored, and, ideally, improved.

In this chapter, we are going to introduce a framework for designing such a machine learning system. We will begin by examining our YouTube search assistant in a bit more detail.

Components of a YouTube Search Agent

First, let's describe our system generally. When a user inputs their question, the system searches YouTube for relevant videos and adds their transcripts to our ever-growing database. Then, the system extracts the most relevant embeddings and associated text, based on an embedding created for the user's search query from our entire database, and the text is then passed to our language model. In practice, the end result is shown in Figure 2.1.

Figure 2.1 YouTube search query

The model used here wasn't trained on data from 2023, so this is an example of using retrieval augmented generation (RAG) to share entirely new information with a language model. We'll talk more about RAG in a later chapter.

Let's think through the different components of this project. At a very high level, our components fit into a few discrete categories:

YouTube retrieval: We have a system for running YouTube searches and fetching the relevant videos. We then generate transcripts from these videos.
Embeddings storage: We use an embedding model to convert chunks of our transcripts into embeddings and then store them in a vector database. We then convert our user's initial question into an embedding, using the same embedding model, and performing a similarity search to retrieve the most relevant excerpts from our videos. Finally, we return the text associated with the embedding and input context for our final LLM inference.
Large language models: Throughout our system, we use LLMs in key places. We use a model to convert our users' questions into relevant YouTube searches, to conduct our final question-answering task, and to generate our embeddings.
User interface: We take user inputs and display outputs inside our application.

Within each of those categories, we have many individual components that need to be designed and implemented. In Figure 2.2, we've laid out a diagram of the major components.

It's important to understand how interdependent the different components in this system are. Your embeddings are only as good as the text you are embedding, which means your transcription system is essential. At the same time, a perfect transcription system is useless if you are unable to find relevant videos, which means your YouTube retrieval system must be great.

Figure 2.2 Components of a YouTube search

Many of your design decisions will be made for you, based on the needs of your project. For example, if you need to fine-tune your models, then the field of potential LLM architectures you might use will narrow considerably, as many of the most popular hosted APIs don't allow fine-tuning and many popular model architectures would be prohibitively expensive to fine-tune on your own. At the same time, many of the decisions regarding infrastructure in a system like this might not be at all obvious to you-at least, not until something breaks.

What does it take to handle many concurrent users in a system like this? If the quality of your system's outputs starts to deteriorate, how would you even know? And once you did know, where would you begin to debug? If you have an entire team of data scientists and engineers working on improving this application, how can you attribute a change in your application's outputs to a particular change in the system?

Principles of a Production Machine Learning System

Architecture is a tricky subject in software engineering-mostly because no one is really sure what it is. Loosely, people tend to say "architecture" when they are discussing the fundamental logic of a system separate from the actual code that implements it. Unsurprisingly, these discussions have a tendency toward pageantry, producing lots of diagrams, taxonomies, and "methodologies" that are promptly ignored by the people who actually build things.

However, this is not to say that architecture is an ignorable concept, just that it needs to be understood through a more practical lens. We quite like one particular definition of architecture from Ralph Johnson, shared by Martin Fowler: "Architecture is about the important stuff. Whatever that is."

In that spirit, we want to focus on what we consider "the important stuff" in designing a machine learning system. Our goal is not to outline a...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Chapter 2 An End-to-End Approach

Components of a YouTube Search Agent

Principles of a Production Machine Learning System

Systemvoraussetzungen

Chapter 2
An End-to-End Approach