RAG-Driven Generative AI

Name: RAG-Driven Generative AI | Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone
Brand: Packt Publishing Limited
Availability: OnlineOnly

Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone

Denis Rothman Rothman, Denis(Author)

Packt Publishing Limited

1st Edition

Published on 18. March 2025

338 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-83620-090-1 (ISBN)

from €32.99

Available for download

Watchlist: see prices

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Minimize AI hallucinations and build accurate, custom generative AI pipelines with RAG using embedded vector databases and integrated human feedback Get With Your Book: PDF Copy, AI Assistant, and Next-Gen Reader FreeKey Features - Implement RAG's traceable outputs, linking each response to its source document to build reliable multimodal conversational agents
- Deliver accurate generative AI models in pipelines integrating RAG, real-time human feedback improvements, and knowledge graphs
- Balance cost and performance between dynamic retrieval datasets and fine-tuning static data
Book DescriptionRAG-Driven Generative AI provides a roadmap for building effective LLM, computer vision, and generative AI systems that balance performance and costs. This book offers a detailed exploration of RAG and how to design, manage, and control multimodal AI pipelines. By connecting outputs to traceable source documents, RAG improves output accuracy and contextual relevance, offering a dynamic approach to managing large volumes of information. This AI book shows you how to build a RAG framework, providing practical knowledge on vector stores, chunking, indexing, and ranking. You'll discover techniques to optimize your project's performance and better understand your data, including using adaptive RAG and human feedback to refine retrieval accuracy, balancing RAG with fine-tuning, implementing dynamic RAG to enhance real-time decision-making, and visualizing complex data with knowledge graphs. You'll be exposed to a hands-on blend of frameworks like LlamaIndex and Deep Lake, vector databases such as Pinecone and Chroma, and models from Hugging Face and OpenAI. By the end of this book, you will have acquired the skills to implement intelligent solutions, keeping you competitive in fields from production to customer service across any project.What you will learn - Scale RAG pipelines to handle large datasets efficiently
- Employ techniques that minimize hallucinations and ensure accurate responses
- Implement indexing techniques to improve AI accuracy with traceable and transparent outputs
- Customize and scale RAG-driven generative AI systems across domains
- Find out how to use Deep Lake and Pinecone for efficient and fast data retrieval
- Control and build robust generative AI systems grounded in real-world data
- Combine text and image data for richer, more informative AI responses
Who this book is forThis book is ideal for data scientists, AI engineers, machine learning engineers, and MLOps engineers. If you are a solutions architect, software developer, product manager, or project manager looking to enhance the decision-making process of building RAG applications, then you'll find this book useful.

All prices

More details

Other editions

Person

Content

Cover
Title Page
Copyright Page
Contributors
Table of Contents
Preface
Making the Most Out of This Book - Get to Know Your Free Benefits
Chapter 1: Why Retrieval Augmented Generation?
What is RAG?
Naïve, advanced, and modular RAG configurations
RAG versus fine-tuning
The RAG ecosystem
The retriever (D)
Collect (D1)
Process (D2)
Storage (D3)
Retrieval query (D4)
The generator (G)
Input (G1)
Augmented input with HF (G2)
Prompt engineering (G3)
Generation and output (G4)
The evaluator (E)
Metrics (E1)
Human feedback (E2)
The trainer (T)
Naïve, advanced, and modular RAG in code
Part 1: Foundations and basic implementation
1. Environment
2. The generator
3. The Data
4.The query
Part 2: Advanced techniques and evaluation
1. Retrieval metrics
2. Naïve RAG
3. Advanced RAG
4. Modular RAG
Summary
Questions
References
Further reading
Chapter 2: RAG Embedding Vector Stores with Deep Lake and OpenAI
From raw data to embeddings in vector stores
Organizing RAG in a pipeline
A RAG-driven generative AI pipeline
Building a RAG pipeline
Setting up the environment
The installation packages and libraries
The components involved in the installation process
1. Data collection and preparation
Collecting the data
Preparing the data
2. Data embedding and storage
Retrieving a batch of prepared documents
Verifying if the vector store exists and creating it if not
The embedding function
Adding data to the vector store
Vector store information
3. Augmented input generation
Input and query retrieval
Augmented input
Evaluating the output with cosine similarity
Summary
Questions
References
Further reading
Chapter 3: Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI
Why use index-based RAG?
Architecture
Building a semantic search engine and generative agent for drone technology
Installing the environment
Pipeline 1: Collecting and preparing the documents
Pipeline 2: Creating and populating a Deep Lake vector store
Pipeline 3: Index-based RAG
User input and query parameters
Cosine similarity metric
Vector store index query engine
Query response and source
Optimized chunking
Performance metric
Tree index query engine
Performance metric
List index query engine
Performance metric
Keyword index query engine
Performance metric
Summary
Questions
References
Further reading
Chapter 4: Multimodal Modular RAG for Drone Technology
What is multimodal modular RAG?
Building a multimodal modular RAG program for drone technology
Loading the LLM dataset
Initializing the LLM query engine
Loading and visualizing the multimodal dataset
Navigating the multimodal dataset structure
Selecting and displaying an image
Adding bounding boxes and saving the image
Building a multimodal query engine
Creating a vector index and query engine
Running a query on the VisDrone multimodal dataset
Processing the response
Selecting and processing the image of the source node
Multimodal modular summary
Performance metric
LLM performance metric
Multimodal performance metric
Multimodal modular RAG performance metric
Summary
Questions
References
Further reading
Chapter 5: Boosting RAG Performance with Expert Human Feedback
Adaptive RAG
Building hybrid adaptive RAG in Python
1. Retriever
1.1. Installing the retriever's environment
1.2.1. Preparing the dataset
1.2.2. Processing the data
1.3. Retrieval process for user input
2. Generator
2.1. Integrating HF-RAG for augmented document inputs
2.2. Input
2.3. Mean ranking simulation scenario
2.4.-2.5. Installing the generative AI environment
2.6. Content generation
3. Evaluator
3.1. Response time
3.2. Cosine similarity score
3.3. Human user rating
3.4. Human-expert evaluation
Summary
Questions
References
Further reading
Chapter 6: Scaling RAG Bank Customer Data with Pinecone
Scaling with Pinecone
Architecture
Pipeline 1: Collecting and preparing the dataset
1. Collecting and processing the dataset
Installing the environment for Kaggle
Collecting the dataset
2. Exploratory data analysis
3. Training an ML model
Data preparation and clustering
Implementation and evaluation of clustering
Pipeline 2: Scaling a Pinecone index (vector store)
The challenges of vector store management
Installing the environment
Processing the dataset
Chunking and embedding the dataset
Chunking
Embedding
Duplicating data
Creating the Pinecone index
Upserting
Querying the Pinecone index
Pipeline 3: RAG generative AI
RAG with GPT-4o
Querying the dataset
Querying a target vector
Extracting relevant texts
Augmented prompt
Augmented generation
Summary
Questions
References
Further reading
Chapter 7: Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex
The architecture of RAG for knowledge-graph-based semantic search
Building graphs from trees
Pipeline 1: Collecting and preparing the documents
Retrieving Wikipedia data and metadata
Preparing the data for upsertion
Pipeline 2: Creating and populating the Deep Lake vector store
Pipeline 3: Knowledge graph index-based RAG
Generating the knowledge graph index
Displaying the graph
Interacting with the knowledge graph index
Installing the similarity score packages and defining the functions
Re-ranking
Example metrics
Metric calculation and display
Summary
Questions
References
Further reading
Chapter 8: Dynamic RAG with Chroma and Hugging Face Llama
The architecture of dynamic RAG
Installing the environment
Hugging Face
Chroma
Activating session time
Downloading and preparing the dataset
Embedding and upserting the data in a Chroma collection
Selecting a model
Embedding and storing the completions
Displaying the embeddings
Querying the collection
Prompt and retrieval
RAG with Llama
Deleting the collection
Total session time
Summary
Questions
References
Further reading
Chapter 9: Empowering AI Models: Fine-Tuning RAG Data and Human Feedback
The architecture of fine-tuning static RAG data
The RAG ecosystem
Installing the environment
1. Preparing the dataset for fine-tuning
1.1. Downloading and visualizing the dataset
1.2. Preparing the dataset for fine-tuning
2. Fine-tuning the model
2.1. Monitoring the fine-tunes
3. Using the fine-tuned OpenAI model
Metrics
Summary
Questions
References
Further reading
Chapter 10 : RAG for Video Stock Production with Pinecone and OpenAI
The architecture of RAG for video production
The environment of the video production ecosystem
Importing modules and libraries
GitHub
OpenAI
Pinecone
Pipeline 1: Generator and Commentator
The AI-generated video dataset
How does a diffusion transformer work?
Analyzing the diffusion transformer model video dataset
The Generator and the Commentator
Step 1. Displaying the video
Step 2. Splitting video into frames
Step 3. Commenting on the frames
Pipeline 1 controller
Pipeline 2: The Vector Store Administrator
Querying the Pinecone index
Pipeline 3: The Video Expert
Summary
Questions
References
Further reading
Appendix
Chapter 1, Why Retrieval Augmented Generation?
Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI
Chapter 3, Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI
Chapter 4, Multimodal Modular RAG for Drone Technology
Chapter 5, Boosting RAG Performance with Expert Human Feedback
Chapter 6, Scaling RAG Bank Customer Data with Pinecone
Chapter 7, Building Scalable Knowledge-Graph-based RAG with Wikipedia API and LlamaIndex
Chapter 8, Dynamic RAG with Chroma and Hugging Face Llama
Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback
Chapter 10, RAG for Video Stock Production with Pinecone and OpenAI
Packt Page
Other Books You May Enjoy
Index

Preface

Designing and managing controlled, reliable, multimodal generative AI pipelines is complex. RAG-Driven Generative AI provides a roadmap for building effective LLM, computer vision, and generative AI systems that will balance performance and costs.

From foundational concepts to complex implementations, this book offers a detailed exploration of how RAG can control and enhance AI systems by tracing each output to its source document. RAG's traceable process allows human feedback for continual improvements, minimizing inaccuracies, hallucinations, and bias. This AI book shows you how to build a RAG framework from scratch, providing practical knowledge on vector stores, chunking, indexing, and ranking. You'll discover techniques in optimizing performance and costs, improving model accuracy by integrating human feedback, balancing costs with when to fine-tune, and improving accuracy and retrieval speed by utilizing embedded-indexed knowledge graphs.

Experience a blend of theory and practice using frameworks like LlamaIndex, Pinecone, and Deep Lake and generative AI platforms such as OpenAI and Hugging Face.

By the end of this book, you will have acquired the skills to implement intelligent solutions, keeping you competitive in fields from production to customer service across any project.

Who this book is for

This book is ideal for data scientists, AI engineers, machine learning engineers, and MLOps engineers, as well as solution architects, software developers, and product and project managers working on LLM and computer vision projects who want to learn and apply RAG for real-world applications. Researchers and natural language processing practitioners working with large language models and text generation will also find the book useful.

What this book covers

Chapter 1, Why Retrieval Augmented Generation?, introduces RAG's foundational concepts, outlines its adaptability across different data types, and navigates the complexities of integrating the RAG framework into existing AI platforms. By the end of this chapter, you will have gained a solid understanding of RAG and practical experience in building diverse RAG configurations for naïve, advanced, and modular RAG using Python, preparing you for more advanced applications in subsequent chapters.

Chapter 2, RAG Embedding Vector Stores with Deep Lake and OpenAI, dives into the complexities of RAG-driven generative AI by focusing on embedding vectors and their storage solutions. We explore the transition from raw data to organized vector stores using Activeloop Deep Lake and OpenAI models, detailing the process of creating and managing embeddings that capture deep semantic meanings. You will learn to build a scalable, multi-team RAG pipeline from scratch in Python by dissecting the RAG ecosystem into independent components. By the end, you'll be equipped to handle large datasets with sophisticated retrieval capabilities, enhancing generative AI outputs with embedded document vectors.

Chapter 3, Building Index-Based RAG with LlamaIndex, Deep Lake, and OpenAI, dives into index-based RAG, focusing on enhancing AI's precision, speed, and transparency through indexing. We'll see how LlamaIndex, Deep Lake, and OpenAI can be integrated to put together a traceable and efficient RAG pipeline. Through practical examples, including a domain-specific drone technology project, you will learn to manage and optimize index-based retrieval systems. By the end, you will be proficient in using various indexing types and understand how to enhance the data integrity and quality of your AI outputs.

Chapter 4, Multimodal Modular RAG for Drone Technology, raises the bar of all generative AI applications by introducing a multimodal modular RAG framework tailored for drone technology. We'll develop a generative AI system that not only processes textual information but also integrates advanced image recognition capabilities. You'll learn to build and optimize a Python-based multimodal modular RAG system, using tools like LlamaIndex, Deep Lake, and OpenAI, to produce rich, context-aware responses to queries.

Chapter 5, Boosting RAG Performance with Expert Human Feedback, introduces adaptive RAG, an innovative enhancement to standard RAG that incorporates human feedback into the generative AI process. By integrating expert feedback directly, we will create a hybrid adaptive RAG system using Python, exploring the integration of human feedback loops to refine data continuously and improve the relevance and accuracy of AI responses.

Chapter 6, Scaling RAG Bank Customer Data with Pinecone, guides you through building a recommendation system to minimize bank customer churn, starting with data acquisition and exploratory analysis using a Kaggle dataset. You'll move onto embedding and upserting large data volumes with Pinecone and OpenAI's technologies, culminating in developing AI-driven recommendations with GPT-4o. By the end, you'll know how to implement advanced vector storage techniques and AI-driven analytics to enhance customer retention strategies.

Chapter 7, Building Scalable Knowledge-Graph-Based RAG with Wikipedia API and LlamaIndex, details the development of three pipelines: data collection from Wikipedia, populating a Deep Lake vector store, and implementing a knowledge graph index-based RAG. You'll learn to automate data retrieval and preparation, create and query a knowledge graph to visualize complex data relationships, and enhance AI-generated responses with structured data insights. You'll be equipped by the end to build and manage a knowledge graph-based RAG system, providing precise, context-aware output.

Chapter 8, Dynamic RAG with Chroma and Hugging Face Llama, explores dynamic RAG using Chroma and Hugging Face's Llama technology. It introduces the concept of creating temporary data collections daily, optimized for specific meetings or tasks, which avoids long-term data storage issues. You will learn to build a Python program that manages and queries these transient datasets efficiently, ensuring that the most relevant and up-to-date information supports every meeting or decision point. By the end, you will be able to implement dynamic RAG systems that enhance responsiveness and precision in data-driven environments.

Chapter 9, Empowering AI Models: Fine-Tuning RAG Data and Human Feedback, focuses on fine-tuning techniques to streamline RAG data, emphasizing how to transform extensive, non-parametric raw data into a more manageable, parametric format with trained weights suitable for continued AI interactions. You'll explore the process of preparing and fine-tuning a dataset, using OpenAI's tools to convert data into prompt and completion pairs for machine learning. Additionally, this chapter will guide you through using OpenAI's GPT-4o-mini model for fine-tuning, assessing its efficiency and cost-effectiveness.

Chapter 10, RAG for Video Stock Production with Pinecone and OpenAI, explores the integration of RAG in video stock production, combining human creativity with AI-driven automation. It details constructing an AI system that produces, comments on, and labels video content, using OpenAI's text-to-video and vision models alongside Pinecone's vector storage capabilities. Starting with video generation and technical commentary, the journey extends to managing embedded video data within a Pinecone vector store.

To get the most out of this book

You should have basic Natural Processing Language (NLP) knowledge and some experience with Python. Additionally, most of the programs in this book are provided as Jupyter notebooks. To run them, all you need is a free Google Gmail account, allowing you to execute the notebooks on Google Colaboratory's free virtual machine (VM). You will also need to generate API tokens for OpenAI, Activeloop, and Pinecone.

The following modules will need to be installed when running the notebooks:

Modules

Version

deeplake

3.9.18 (with Pillow)

openai

1.40.3 (requires regular upgrades)

transformers

4.41.2

numpy

>=1.24.1 (Upgraded to satisfy chex)

deepspeed

0.10.1

bitsandbytes

0.41.1

accelerate

0.31.0

...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

RAG-Driven Generative AI

Description

All prices

More details

Other editions

Additional editions

Person

Content

Preface

Who this book is for

What this book covers

To get the most out of this book

System requirements