Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
A much-needed guide to implementing new technology in workspaces
From experts in the field comes Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure, a book that provides data scientists and managers with best practices at the intersection of management, large language models (LLMs), machine learning, and data science. This groundbreaking book will change the way that you view the pipeline of data science. The authors provide an introduction to modern machine learning, showing you how it can be viewed as a holistic, end-to-end system-not just shiny new gadget in an otherwise unchanged operational structure. By adopting a data-centric view of the world, you can begin to see unstructured data and LLMs as the foundation upon which you can build countless applications and business solutions. This book explores a whole world of decision making that hasn't been codified yet, enabling you to forge the future using emerging best practices.
This book is indispensable for data professionals and business leaders looking to understand LLMs and the entire data science pipeline.
Kristen Kehrer has been providing innovative and practical statistical modeling solutions since 2010. In 2018, she achieved recognition as a LinkedIn Top Voice in Data Science & Analytics. Kristen is also the founder of Data Moves Me, LLC.
Caleb Kaiser is a Full Stack Engineer at Comet. Caleb was previously on the Founding Team at Cortex Labs. Caleb also worked at Scribe Media on the Author Platform Team.
Introduction ix
1 A Gentle Introduction to Modern Machine Learning 1
Data Science Is Diverging from Business Intelligence 3
From CRISP-DM to Modern, Multicomponent ml Systems 4
The Emergence of LLMs Has Increased ML's Power and Complexity 7
What You Can Expect from This Book 9
2 An End-to-End Approach 11
Components of a YouTube Search Agent 13
Principles of a Production Machine Learning System 16
Observability 19
Reproducibility 19
Interoperability 20
Scalability 21
Improvability 22
A Note on Tools 23
3 A Data-Centric View 25
The Emergence of Foundation Models 25
The Role of Off-the-Shelf Components 27
The Data-Driven Approach 28
A Note on Data Ethics 28
Building the Dataset 30
Working with Vector Databases 34
Data Versioning and Management 50
Getting Started with Data Versioning 53
Knowing "Just Enough" Engineering 57
4 Standing Up Your LLM 61
Selecting Your LLM 61
What Type of Inference Do I Need to Perform? 65
How Open-Ended Is This Task? 66
What Are the Privacy Concerns for This Data? 66
How Much Will This Model Cost? 67
Experiment Management with LLMs 68
LLM Inference 74
Basics of Prompt Engineering 74
In-Context Learning 77
Intermediary Computation 85
Augmented Generation 89
Agentic Techniques 94
Optimizing LLM Inference with Experiment Management 102
Fine-Tuning LLMs 111
When to Fine-Tune an LLM 112
Quantization, QLOrA, and Parameter Efficient Fine-Tuning 113
Wrapping Things Up 121
5 Putting Together an Application 123
Prototyping with Gradio 125
Creating Graphics with Plotnine 128
Adding the Author Selector 137
Adding a Logo 138
Adding a Tab 139
Adding a Title and Subtitle 140
Changing the Color of the Buttons 140
Click to Download Button 141
Putting It All Together 141
Deploying Models as APIs 144
Implementing an API with FastAPI 146
Implementing Uvicorn 148
Monitoring an LLM 149
Dockerizing Your Service 151
Deploying Your Own LLM 154
Wrapping Things Up 159
6 Rounding Out the ML Life Cycle 161
Deploying a Simple Random Forest Model 161
An Introduction to Model Monitoring 167
Model Monitoring with Evidently AI 175
Building a Model Monitoring System 176
Final Thoughts on Monitoring 187
7 Review of Best Practices 189
Step 1: Understand the Problem 189
Step 2: Model Selection and Training 190
Step 3: Deploy and Maintain 192
Step 4: Collaborate and Communicate 196
Emerging Trends in LLMs 197
Next Steps in Learning 199
Appendix: Additional LLM Example 201
Index 209
The focus of this book is on building end-to-end, production machine learning systems. With that in mind, we should begin by defining what these terms mean. We promise-this isn't just pedantry. Over the last 20 years, terms like end-to-end and production have been thrown around a lot in the world of data science, and depending on the time period, their definitions may vary wildly.
Imagine working on a business intelligence team at a shoe retailer in 2015, when working with data looked quite different than it does today. Your team is tasked with sales forecasting for the next quarter. What would your end-to-end system look like?
You begin by building your dataset. Being 2015, it's very likely that your company's data is a nightmare to access and ingest, but after considerable effort, your team is able to curate a clean dataset. Next, you focus on modeling. Your team will probably experiment with a variety of models, from ARIMA to random forests and maybe even gradient boosting (XGBoost embeddings was initially released in 2014, after all). After much tweaking and tuning, and ideally some robust validation, you finally have your model. Now, you can get to the business of predicting next quarter's sales and sharing your results. Sharing your results could mean many things here. You may have produced a dashboard for the chief revenue officer (CRO) or manually generated predictions each day using a tool like Statistical Package for Social Sciences (SPSS). Maybe you scheduled a job to run every day that would create the new day's actuals via a macro. Or you might start your day by inspecting the forecast and writing an email to share the results.
The hypothetical sales forecasting project is, in many ways, straightforward. This is not to say that it is easy. Any project that requires this much manual effort is difficult. Curating a dataset from years of unhygienic legacy data is arduous. Presenting your forecasts to a nontechnical audience without boring them is an art. There are a seemingly infinite number of experiments you might run in the modeling phase, and conducting a reliable validation process is rarely simple. But, if we break down the components of this project, there aren't that many architectural decisions to make.
And that's basically it. You don't need to make more architectural decisions. Because it's 2015, you probably aren't using any real experiment management solution outside of a spreadsheet. Data versioning isn't likely to be done in any official sort of way. Your model doesn't need to be "deployed" in any sense, although if you're using a macro, that might technically qualify. You will probably generate predictions by running a notebook or a local script, which might be stored with some kind of version control. But generally speaking, this is all that is encompassed by your end-to-end system, and it works-at least, for this particular system.
But what about a more complex machine learning system, like the YouTube search assistant we mentioned in Chapter 1? This system requires multiple models interacting in a pipeline. It involves a database with support for vectors to store embeddings.
A vector database is a database built for storing and querying high-dimensional data. Many popular techniques like RAG (Retrieval Augmented Generation) rely on manipulating text embeddings, which are high dimensional vectors, stored in vector databases.
The application must implement some retrieval logic to get relevant videos and excerpts, a way to generate transcripts from videos, and create embeddings for that text. Your inference pipeline needs to be accessible for real-time generation, and your front end can't simply be a chart in some report-you need a full-blown application. And, of course, your system needs to be able to scale to handle many concurrent users.
Beyond any individual technical difference, the most important difference to note is that this system is not a one-off bit of data analysis. It is an ongoing software project, one that needs to be maintained, monitored, and, ideally, improved.
In this chapter, we are going to introduce a framework for designing such a machine learning system. We will begin by examining our YouTube search assistant in a bit more detail.
First, let's describe our system generally. When a user inputs their question, the system searches YouTube for relevant videos and adds their transcripts to our ever-growing database. Then, the system extracts the most relevant embeddings and associated text, based on an embedding created for the user's search query from our entire database, and the text is then passed to our language model. In practice, the end result is shown in Figure 2.1.
Figure 2.1 YouTube search query
The model used here wasn't trained on data from 2023, so this is an example of using retrieval augmented generation (RAG) to share entirely new information with a language model. We'll talk more about RAG in a later chapter.
Let's think through the different components of this project. At a very high level, our components fit into a few discrete categories:
Within each of those categories, we have many individual components that need to be designed and implemented. In Figure 2.2, we've laid out a diagram of the major components.
It's important to understand how interdependent the different components in this system are. Your embeddings are only as good as the text you are embedding, which means your transcription system is essential. At the same time, a perfect transcription system is useless if you are unable to find relevant videos, which means your YouTube retrieval system must be great.
Figure 2.2 Components of a YouTube search
Many of your design decisions will be made for you, based on the needs of your project. For example, if you need to fine-tune your models, then the field of potential LLM architectures you might use will narrow considerably, as many of the most popular hosted APIs don't allow fine-tuning and many popular model architectures would be prohibitively expensive to fine-tune on your own. At the same time, many of the decisions regarding infrastructure in a system like this might not be at all obvious to you-at least, not until something breaks.
What does it take to handle many concurrent users in a system like this? If the quality of your system's outputs starts to deteriorate, how would you even know? And once you did know, where would you begin to debug? If you have an entire team of data scientists and engineers working on improving this application, how can you attribute a change in your application's outputs to a particular change in the system?
Architecture is a tricky subject in software engineering-mostly because no one is really sure what it is. Loosely, people tend to say "architecture" when they are discussing the fundamental logic of a system separate from the actual code that implements it. Unsurprisingly, these discussions have a tendency toward pageantry, producing lots of diagrams, taxonomies, and "methodologies" that are promptly ignored by the people who actually build things.
However, this is not to say that architecture is an ignorable concept, just that it needs to be understood through a more practical lens. We quite like one particular definition of architecture from Ralph Johnson, shared by Martin Fowler: "Architecture is about the important stuff. Whatever that is."
In that spirit, we want to focus on what we consider "the important stuff" in designing a machine learning system. Our goal is not to outline a...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.