
RAG-Driven Generative AI
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Deliver accurate generative AI models in pipelines integrating RAG, real-time human feedback improvements, and knowledge graphs
- Balance cost and performance between dynamic retrieval datasets and fine-tuning static data
Book DescriptionRAG-Driven Generative AI provides a roadmap for building effective LLM, computer vision, and generative AI systems that balance performance and costs. This book offers a detailed exploration of RAG and how to design, manage, and control multimodal AI pipelines. By connecting outputs to traceable source documents, RAG improves output accuracy and contextual relevance, offering a dynamic approach to managing large volumes of information. This AI book shows you how to build a RAG framework, providing practical knowledge on vector stores, chunking, indexing, and ranking. You'll discover techniques to optimize your project's performance and better understand your data, including using adaptive RAG and human feedback to refine retrieval accuracy, balancing RAG with fine-tuning, implementing dynamic RAG to enhance real-time decision-making, and visualizing complex data with knowledge graphs. You'll be exposed to a hands-on blend of frameworks like LlamaIndex and Deep Lake, vector databases such as Pinecone and Chroma, and models from Hugging Face and OpenAI. By the end of this book, you will have acquired the skills to implement intelligent solutions, keeping you competitive in fields from production to customer service across any project.What you will learn - Scale RAG pipelines to handle large datasets efficiently
- Employ techniques that minimize hallucinations and ensure accurate responses
- Implement indexing techniques to improve AI accuracy with traceable and transparent outputs
- Customize and scale RAG-driven generative AI systems across domains
- Find out how to use Deep Lake and Pinecone for efficient and fast data retrieval
- Control and build robust generative AI systems grounded in real-world data
- Combine text and image data for richer, more informative AI responses
Who this book is forThis book is ideal for data scientists, AI engineers, machine learning engineers, and MLOps engineers. If you are a solutions architect, software developer, product manager, or project manager looking to enhance the decision-making process of building RAG applications, then you'll find this book useful.
All prices
More details
Other editions
Additional editions

Person
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, and as a student, he wrote and registered a patent for one of the earliest word2vector embeddings and word piece tokenization solutions. He started a company focused on deploying AI and went on to author one of the first AI cognitive NLP chatbots, applied as a language teaching tool for Moët et Chandon (part of LVMH) and more. Denis rapidly became an expert in explainable AI, incorporating interpretable, acceptance-based explanation data and interfaces into solutions implemented for major corporate projects in the aerospace, apparel, and supply chain sectors. His core belief is that you only really know something once you have taught somebody how to do it.
Content
- Cover
- Title Page
- Copyright Page
- Contributors
- Table of Contents
- Preface
- Making the Most Out of This Book - Get to Know Your Free Benefits
- Chapter 1: Why Retrieval Augmented Generation?
- What is RAG?
- Naïve, advanced, and modular RAG configurations
- RAG versus fine-tuning
- The RAG ecosystem
- The retriever (D)
- Collect (D1)
- Process (D2)
- Storage (D3)
- Retrieval query (D4)
- The generator (G)
- Input (G1)
- Augmented input with HF (G2)
- Prompt engineering (G3)
- Generation and output (G4)
- The evaluator (E)
- Metrics (E1)
- Human feedback (E2)
- The trainer (T)
- Naïve, advanced, and modular RAG in code
- Part 1: Foundations and basic implementation
- 1. Environment
- 2. The generator
- 3. The Data
- 4.The query
- Part 2: Advanced techniques and evaluation
- 1. Retrieval metrics
- 2. Naïve RAG
- 3. Advanced RAG
- 4. Modular RAG
- Summary
- Questions
- References
- Further reading
- Chapter 2: RAG Embedding Vector Stores with Deep Lake and OpenAI
- From raw data to embeddings in vector stores
- Organizing RAG in a pipeline
- A RAG-driven generative AI pipeline
- Building a RAG pipeline
- Setting up the environment
- The installation packages and libraries
- The components involved in the installation process
- 1. Data collection and preparation
- Collecting the data
- Preparing the data
- 2. Data embedding and storage
- Retrieving a batch of prepared documents
- Verifying if the vector store exists and creating it if not
- The embedding function
- Adding data to the vector store
- Vector store information
- 3. Augmented input generation
- Input and query retrieval
- Augmented input
- Evaluating the output with cosine similarity
- Summary
- Questions
- References
- Further reading
- Chapter 3: Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI
- Why use index-based RAG?
- Architecture
- Building a semantic search engine and generative agent for drone technology
- Installing the environment
- Pipeline 1: Collecting and preparing the documents
- Pipeline 2: Creating and populating a Deep Lake vector store
- Pipeline 3: Index-based RAG
- User input and query parameters
- Cosine similarity metric
- Vector store index query engine
- Query response and source
- Optimized chunking
- Performance metric
- Tree index query engine
- Performance metric
- List index query engine
- Performance metric
- Keyword index query engine
- Performance metric
- Summary
- Questions
- References
- Further reading
- Chapter 4: Multimodal Modular RAG for Drone Technology
- What is multimodal modular RAG?
- Building a multimodal modular RAG program for drone technology
- Loading the LLM dataset
- Initializing the LLM query engine
- Loading and visualizing the multimodal dataset
- Navigating the multimodal dataset structure
- Selecting and displaying an image
- Adding bounding boxes and saving the image
- Building a multimodal query engine
- Creating a vector index and query engine
- Running a query on the VisDrone multimodal dataset
- Processing the response
- Selecting and processing the image of the source node
- Multimodal modular summary
- Performance metric
- LLM performance metric
- Multimodal performance metric
- Multimodal modular RAG performance metric
- Summary
- Questions
- References
- Further reading
- Chapter 5: Boosting RAG Performance with Expert Human Feedback
- Adaptive RAG
- Building hybrid adaptive RAG in Python
- 1. Retriever
- 1.1. Installing the retriever's environment
- 1.2.1. Preparing the dataset
- 1.2.2. Processing the data
- 1.3. Retrieval process for user input
- 2. Generator
- 2.1. Integrating HF-RAG for augmented document inputs
- 2.2. Input
- 2.3. Mean ranking simulation scenario
- 2.4.-2.5. Installing the generative AI environment
- 2.6. Content generation
- 3. Evaluator
- 3.1. Response time
- 3.2. Cosine similarity score
- 3.3. Human user rating
- 3.4. Human-expert evaluation
- Summary
- Questions
- References
- Further reading
- Chapter 6: Scaling RAG Bank Customer Data with Pinecone
- Scaling with Pinecone
- Architecture
- Pipeline 1: Collecting and preparing the dataset
- 1. Collecting and processing the dataset
- Installing the environment for Kaggle
- Collecting the dataset
- 2. Exploratory data analysis
- 3. Training an ML model
- Data preparation and clustering
- Implementation and evaluation of clustering
- Pipeline 2: Scaling a Pinecone index (vector store)
- The challenges of vector store management
- Installing the environment
- Processing the dataset
- Chunking and embedding the dataset
- Chunking
- Embedding
- Duplicating data
- Creating the Pinecone index
- Upserting
- Querying the Pinecone index
- Pipeline 3: RAG generative AI
- RAG with GPT-4o
- Querying the dataset
- Querying a target vector
- Extracting relevant texts
- Augmented prompt
- Augmented generation
- Summary
- Questions
- References
- Further reading
- Chapter 7: Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex
- The architecture of RAG for knowledge-graph-based semantic search
- Building graphs from trees
- Pipeline 1: Collecting and preparing the documents
- Retrieving Wikipedia data and metadata
- Preparing the data for upsertion
- Pipeline 2: Creating and populating the Deep Lake vector store
- Pipeline 3: Knowledge graph index-based RAG
- Generating the knowledge graph index
- Displaying the graph
- Interacting with the knowledge graph index
- Installing the similarity score packages and defining the functions
- Re-ranking
- Example metrics
- Metric calculation and display
- Summary
- Questions
- References
- Further reading
- Chapter 8: Dynamic RAG with Chroma and Hugging Face Llama
- The architecture of dynamic RAG
- Installing the environment
- Hugging Face
- Chroma
- Activating session time
- Downloading and preparing the dataset
- Embedding and upserting the data in a Chroma collection
- Selecting a model
- Embedding and storing the completions
- Displaying the embeddings
- Querying the collection
- Prompt and retrieval
- RAG with Llama
- Deleting the collection
- Total session time
- Summary
- Questions
- References
- Further reading
- Chapter 9: Empowering AI Models: Fine-Tuning RAG Data and Human Feedback
- The architecture of fine-tuning static RAG data
- The RAG ecosystem
- Installing the environment
- 1. Preparing the dataset for fine-tuning
- 1.1. Downloading and visualizing the dataset
- 1.2. Preparing the dataset for fine-tuning
- 2. Fine-tuning the model
- 2.1. Monitoring the fine-tunes
- 3. Using the fine-tuned OpenAI model
- Metrics
- Summary
- Questions
- References
- Further reading
- Chapter 10 : RAG for Video Stock Production with Pinecone and OpenAI
- The architecture of RAG for video production
- The environment of the video production ecosystem
- Importing modules and libraries
- GitHub
- OpenAI
- Pinecone
- Pipeline 1: Generator and Commentator
- The AI-generated video dataset
- How does a diffusion transformer work?
- Analyzing the diffusion transformer model video dataset
- The Generator and the Commentator
- Step 1. Displaying the video
- Step 2. Splitting video into frames
- Step 3. Commenting on the frames
- Pipeline 1 controller
- Pipeline 2: The Vector Store Administrator
- Querying the Pinecone index
- Pipeline 3: The Video Expert
- Summary
- Questions
- References
- Further reading
- Appendix
- Chapter 1, Why Retrieval Augmented Generation?
- Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI
- Chapter 3, Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI
- Chapter 4, Multimodal Modular RAG for Drone Technology
- Chapter 5, Boosting RAG Performance with Expert Human Feedback
- Chapter 6, Scaling RAG Bank Customer Data with Pinecone
- Chapter 7, Building Scalable Knowledge-Graph-based RAG with Wikipedia API and LlamaIndex
- Chapter 8, Dynamic RAG with Chroma and Hugging Face Llama
- Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback
- Chapter 10, RAG for Video Stock Production with Pinecone and OpenAI
- Packt Page
- Other Books You May Enjoy
- Index
Preface
Designing and managing controlled, reliable, multimodal generative AI pipelines is complex. RAG-Driven Generative AI provides a roadmap for building effective LLM, computer vision, and generative AI systems that will balance performance and costs.
From foundational concepts to complex implementations, this book offers a detailed exploration of how RAG can control and enhance AI systems by tracing each output to its source document. RAG's traceable process allows human feedback for continual improvements, minimizing inaccuracies, hallucinations, and bias. This AI book shows you how to build a RAG framework from scratch, providing practical knowledge on vector stores, chunking, indexing, and ranking. You'll discover techniques in optimizing performance and costs, improving model accuracy by integrating human feedback, balancing costs with when to fine-tune, and improving accuracy and retrieval speed by utilizing embedded-indexed knowledge graphs.
Experience a blend of theory and practice using frameworks like LlamaIndex, Pinecone, and Deep Lake and generative AI platforms such as OpenAI and Hugging Face.
By the end of this book, you will have acquired the skills to implement intelligent solutions, keeping you competitive in fields from production to customer service across any project.
Who this book is for
This book is ideal for data scientists, AI engineers, machine learning engineers, and MLOps engineers, as well as solution architects, software developers, and product and project managers working on LLM and computer vision projects who want to learn and apply RAG for real-world applications. Researchers and natural language processing practitioners working with large language models and text generation will also find the book useful.
What this book covers
Chapter 1, Why Retrieval Augmented Generation?, introduces RAG's foundational concepts, outlines its adaptability across different data types, and navigates the complexities of integrating the RAG framework into existing AI platforms. By the end of this chapter, you will have gained a solid understanding of RAG and practical experience in building diverse RAG configurations for naïve, advanced, and modular RAG using Python, preparing you for more advanced applications in subsequent chapters.
Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI, dives into the complexities of RAG-driven generative AI by focusing on embedding vectors and their storage solutions. We explore the transition from raw data to organized vector stores using Activeloop Deep Lake and OpenAI models, detailing the process of creating and managing embeddings that capture deep semantic meanings. You will learn to build a scalable, multi-team RAG pipeline from scratch in Python by dissecting the RAG ecosystem into independent components. By the end, you'll be equipped to handle large datasets with sophisticated retrieval capabilities, enhancing generative AI outputs with embedded document vectors.
Chapter 3, Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI, dives into index-based RAG, focusing on enhancing AI's precision, speed, and transparency through indexing. We'll see how LlamaIndex, Deep Lake, and OpenAI can be integrated to put together a traceable and efficient RAG pipeline. Through practical examples, including a domain-specific drone technology project, you will learn to manage and optimize index-based retrieval systems. By the end, you will be proficient in using various indexing types and understand how to enhance the data integrity and quality of your AI outputs.
Chapter 4, Multimodal Modular RAG for Drone Technology, raises the bar of all generative AI applications by introducing a multimodal modular RAG framework tailored for drone technology. We'll develop a generative AI system that not only processes textual information but also integrates advanced image recognition capabilities. You'll learn to build and optimize a Python-based multimodal modular RAG system, using tools like LlamaIndex, Deep Lake, and OpenAI, to produce rich, context-aware responses to queries.
Chapter 5, Boosting RAG Performance with Expert Human Feedback, introduces adaptive RAG, an innovative enhancement to standard RAG that incorporates human feedback into the generative AI process. By integrating expert feedback directly, we will create a hybrid adaptive RAG system using Python, exploring the integration of human feedback loops to refine data continuously and improve the relevance and accuracy of AI responses.
Chapter 6, Scaling RAG Bank Customer Data with Pinecone, guides you through building a recommendation system to minimize bank customer churn, starting with data acquisition and exploratory analysis using a Kaggle dataset. You'll move onto embedding and upserting large data volumes with Pinecone and OpenAI's technologies, culminating in developing AI-driven recommendations with GPT-4o. By the end, you'll know how to implement advanced vector storage techniques and AI-driven analytics to enhance customer retention strategies.
Chapter 7, Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex, details the development of three pipelines: data collection from Wikipedia, populating a Deep Lake vector store, and implementing a knowledge graph index-based RAG. You'll learn to automate data retrieval and preparation, create and query a knowledge graph to visualize complex data relationships, and enhance AI-generated responses with structured data insights. You'll be equipped by the end to build and manage a knowledge graph-based RAG system, providing precise, context-aware output.
Chapter 8, Dynamic RAG with Chroma and Hugging Face Llama, explores dynamic RAG using Chroma and Hugging Face's Llama technology. It introduces the concept of creating temporary data collections daily, optimized for specific meetings or tasks, which avoids long-term data storage issues. You will learn to build a Python program that manages and queries these transient datasets efficiently, ensuring that the most relevant and up-to-date information supports every meeting or decision point. By the end, you will be able to implement dynamic RAG systems that enhance responsiveness and precision in data-driven environments.
Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback, focuses on fine-tuning techniques to streamline RAG data, emphasizing how to transform extensive, non-parametric raw data into a more manageable, parametric format with trained weights suitable for continued AI interactions. You'll explore the process of preparing and fine-tuning a dataset, using OpenAI's tools to convert data into prompt and completion pairs for machine learning. Additionally, this chapter will guide you through using OpenAI's GPT-4o-mini model for fine-tuning, assessing its efficiency and cost-effectiveness.
Chapter 10, RAG for Video Stock Production with Pinecone and OpenAI, explores the integration of RAG in video stock production, combining human creativity with AI-driven automation. It details constructing an AI system that produces, comments on, and labels video content, using OpenAI's text-to-video and vision models alongside Pinecone's vector storage capabilities. Starting with video generation and technical commentary, the journey extends to managing embedded video data within a Pinecone vector store.
To get the most out of this book
You should have basic Natural Processing Language (NLP) knowledge and some experience with Python. Additionally, most of the programs in this book are provided as Jupyter notebooks. To run them, all you need is a free Google Gmail account, allowing you to execute the notebooks on Google Colaboratory's free virtual machine (VM). You will also need to generate API tokens for OpenAI, Activeloop, and Pinecone.
The following modules will need to be installed when running the notebooks:
Modules
Version
deeplake
3.9.18 (with Pillow)
openai
1.40.3 (requires regular upgrades)
transformers
4.41.2
numpy
>=1.24.1 (Upgraded to satisfy chex)
deepspeed
0.10.1
bitsandbytes
0.41.1
accelerate
0.31.0
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.