
The Definitive Guide to OpenSearch
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Address real-world scenarios with detailed case studies, applying knowledge to practical projects and challenges
- Learn best practices, avoid pitfalls, and optimize OpenSearch setups with professional insights
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionFrom seasoned data professionals managing billions of records to aspiring analysts exploring diverse datasets, this guide is for users at all levels who want to make the most of OpenSearch's capabilities and functionalities. Written by distinguished AWS Solutions Architects Jon Handler, Ph.D., a former search engine developer, Prashant Agrawal, a search specialist, and Soujanya Konka, an expert in large-scale data migrations, this guide brings together deep technical expertise with practical, hands-on knowledge of implementing OpenSearch in real-world scenarios. Starting with an introduction to OpenSearch, you'll get to grips with the key features before delving into essential topics such as installing OpenSearch, ingesting data, crafting queries, visualizing results, ensuring security, and optimizing performance. Each concept is accompanied by practical examples and tutorials, allowing you to grasp the material through hands-on experience. Keeping up with OpenSearch's new releases and updates, this book equips you to fully leverage its potential through real-world scenarios and examples that demonstrate how OpenSearch works. Whether enhancing your search experience or extracting insightful analytics from data, The Definitive Guide to OpenSearch provides developers, engineers, data scientists, and system administrators with the tools needed to thrive.What you will learn - Understand OpenSearch fundamentals, architecture, and components
- Benefit from hands-on demos of data indexing, query crafting, and advanced features
- Master OpenSearch Dashboards to build monitoring solutions
- Discover techniques for scaling OpenSearch to handle large datasets and high traffic
- Explore performance optimization strategies
- Study example cases of successful OpenSearch applications
- Uncover OpenSearch integrations across industries through real-world cases
Who this book is forThis book is ideal for data professionals, developers, engineers, data scientists, and system administrators seeking to harness the power of OpenSearch for search and analytics use cases. Whether you're a beginner or an experienced user, this guide offers valuable insights and practical knowledge to help you navigate the complexities of deploying and managing OpenSearch clusters effectively. For anyone looking to leverage OpenSearch for building robust search experiences and gaining actionable insights from data, this book is a must-have resource.
All prices
More details
Content
- Cover
- Copyright
- Foreword
- Contributors
- Table of Contents
- Preface
- Your Book Comes with Exclusive Perks - Here's How to Unlock Them
- Part 1: Getting Started with OpenSearch: Fundamentals and Deployment
- Chapter 1: Overview of OpenSearch
- Introducing OpenSearch and its evolution journey
- Evolution of OpenSearch
- Understanding the core capabilities of OpenSearch
- Distributed database
- Lexical search
- Semantic search with vector embeddings
- Log analytics
- Real-world examples and use cases
- Revolutionizing e-commerce search with OpenSearch
- Transformative search: a fashionable journey with Iva
- Maximizing operational efficiency with OpenSearch log analytics and observability
- Hello OpenSearch
- Summary
- Join our community on Discord
- Chapter 2: Installing and Configuring OpenSearch
- Understanding key terminology
- Nodes basics
- Cluster basics: the backbone of OpenSearch
- Index insights: organizing data in OpenSearch
- How shards work
- How segments work
- System requirements and compatibility
- Operating system compatibility matrix
- Java compatibility matrix
- Network configuration
- Recommended filesystem setup for better performance
- Installation guide for OpenSearch
- Using a tarball
- Using Docker
- Setting up OpenSearch Dashboards
- Using a tarball (locally)
- Using Docker
- Setting the foundation for advanced cluster configuration
- OpenSearch cluster settings: static and dynamic
- OpenSearch Dashboards settings
- Security considerations and setup
- Introducing authentication and authorization
- Initial exploration of OpenSearch functionalities
- Indexing Iva's fashionable finds
- Searching for fashion inspiration
- Summary
- Chapter 3: Deployment Options: Amazon OpenSearch Service and Amazon OpenSearch Serverless
- Introduction to Amazon OpenSearch Service
- Architecture and components
- Key features
- Infrastructure of Amazon OpenSearch Service Domains
- Managing Amazon OpenSearch Service Domains
- Rightsizing
- Scaling Amazon OpenSearch Service Domains
- Snapshots in Amazon OpenSearch Service Managed Clusters
- Storage management
- Amazon OpenSearch Serverless
- Creating and managing Amazon OpenSearch Serverless collections
- Ingesting data into Amazon OpenSearch Serverless collections
- Security in Amazon OpenSearch Serverless
- Supported operations and plugins in Amazon OpenSearch Serverless
- Monitoring Amazon OpenSearch Serverless
- Choosing between OpenSearch Service-managed clusters and OpenSearch Serverless
- OpenSearch hosting partners
- Summary
- Join our community on Discord
- Part 2: Data Management and Discovery: Indexing, Querying, and Visualization
- Chapter 4: Indexing Data
- Technical requirements
- Overview of indexing
- Hands-on: Connecting to OpenSearch Dashboards
- Creating an index
- The _bulk API
- Mapping your data
- Creating your index via an API
- Understanding index settings
- Diving into mappings
- Mapping types
- String mapping types
- Advanced mapping types
- Summary
- Chapter 5: Searching: Core APIs
- Technical requirements
- Query processing
- Matching
- Merging
- Scoring and sorting
- Fetching
- Hands-on: loading data
- OpenSearch's query API and supported languages
- Format of a Query DSL query
- match_all: the most basic query
- Pagination
- Leaf queries
- Text queries
- Term queries
- Highlighting in OpenSearch queries
- Completions and suggestions
- Search templates
- Summary
- Join our community on Discord
- Chapter 6: Advanced Querying
- Technical requirements
- Compound queries and filters
- bool queries
- Geospatial queries and aggregations
- Faceted search
- Query percolation
- How to run the profile API
- Summary
- Chapter 7: Analyze and Visualize OpenSearch Data
- Technical requirements
- Introduction to Dashboards
- Management
- OpenSearch Plugins
- Types of aggregation
- Metric aggregations
- Bucket aggregations
- Nested aggregations
- Pipeline aggregations
- Visualizations
- Total bytes over time
- Traffic
- Traffic by country
- Error codes by request
- Traffic flows
- Logging and Observability
- Step 1: The Management section
- Step 2: The Observability section
- Step 3: The Observability Plugins section
- Specialized query languages
- Low-cost logging and observability
- Key features of Flint indexing
- OpenSearch Assistant for Dashboards
- Best practices for log workloads with OpenSearch
- Additional resources and references
- Configuring Amazon S3 as a data source with OpenSearch
- Summary
- Join our community on Discord
- Part 3: Extending OpenSearch: Plugins, AI Integration, and Application Development
- Chapter 8: Introduction to OpenSearch Plugins
- Built-in plugins and custom plugins
- Key OpenSearch plugins and their functions
- OpenSearch SQL plugin
- OpenSearch Job Scheduler plugin
- OpenSearch Alerting plugin
- OpenSearch Index State Management (ISM) plugin
- Security plugin
- Security Analytics plugin
- KNN plugin
- Neural Search plugin
- Learning to Rank (LTR) plugin
- Installing and managing plugins
- Installing plugins
- Managing plugins
- Building your own plugins
- Advanced plugin architecture
- Plugin lifecycle management
- Dependency injection
- Custom plugin development
- Do you want to develop a plugin?
- Plugin best practices
- The future of OpenSearch plugins
- Summary
- References
- Chapter 9: OpenSearch in Action: Making Apps Awesome
- Meet Iva - a developer on a mission to build a smarter movie search app
- API-driven development - making your app talk to OpenSearch
- Understanding OpenSearch APIs
- Setting up a Python virtual environment
- Connecting to OpenSearch using Python
- Testing API queries before UI integration
- Autocomplete and fuzzy search - making search more user-friendly
- Implementing autocomplete
- Implementing fuzzy search to handle typos
- Combining autocomplete and fuzzy search
- Filtering and faceted search - giving users more control
- Filtering by genre
- Filtering by release year
- Applying multiple filters together
- Implementing faceted search for dynamic filtering
- Bringing it all together in a UI
- Setting up Streamlit
- Connecting Streamlit to OpenSearch
- Implementing the search bar with autocomplete
- Performing the search and displaying results
- Adding filters for genre and release year
- Running the app
- Summary
- Join our community on Discord
- Chapter 10: OpenSearch Vectors and Generative AI
- Technical requirements
- Vectorization of data
- Dense vectors
- Sparse vectors
- Semantic search
- ML Commons and ML models
- Exact K-Nearest-Neighbor
- Approximate nearest neighbor
- Sparse vectors and hybrid search
- Generative AI (gen AI) architectures and components
- Summary
- Chapter 11: Migrate to OpenSearch
- Why OpenSearch?
- Open source, community-driven, and vendor-neutral
- Familiar APIs and an easy transition path
- Expanding ecosystem and tooling
- From Apache Solr (enterprise search)
- From Algolia (Search as a Service)
- From Splunk (logs and SIEM)
- From Elasticsearch (all use cases)
- From Amazon CloudSearch (Search as a Service)
- Stages of migration
- Planning
- Proof of concept (POC)
- Set up a test cluster
- Compatibility testing
- Performance and scalability testing
- Deploy
- Deploy in phases
- Migrate data and indexes
- Cutover to OpenSearch
- Continuous monitoring and optimization
- Patterns for minimal or no-downtime migration
- Dual-write pattern: migrating e-commerce search without losing a beat
- Shadow read pattern: matching relevance in travel search
- Blue-green deployment: taking control of logging from Splunk
- Canary deployment: search reinvented for a national news site
- Cold data replay: compliance-first migration for a fintech company
- OpenSearch Migration Assistant
- Pre-migration checks and metadata review
- Traffic replay for near zero-downtime testing
- Historical data migration
- Migration management console
- Deployment options
- How teams are moving to OpenSearch - without missing a beat
- AnyMovie's search migration and modernization
- Final outcome
- AnyLog's live logs migration-a cleaner path to observability
- Outcome
- Summary
- Join our community on Discord
- Part 4: Securing and Optimizing OpenSearch: Administration Best Practices
- Chapter 12: Security in OpenSearch
- OpenSearch's security framework and components
- The core components of security
- Authentication and authorization mechanisms
- Multi-tenant security architecture
- Auditing and compliance
- Summary
- Chapter 13: Monitoring, Backup, and Recovery
- Monitoring an OpenSearch domain
- Monitoring tools
- Key metrics and dashboards for monitoring
- Dashboards for monitoring and alarms
- Admission control and backpressure mechanisms
- Admission control
- Backpressure mechanisms
- Troubleshooting and scaling
- Performance tuning
- Backup strategies for data resilience
- Disaster recovery architecture in OpenSearch
- Post-recovery validation
- Summary
- Join our community on Discord
- Chapter 14: Scaling and Performance Optimization
- Understanding OpenSearch as a distributed system
- OpenSearch distributed architecture
- Amazon OpenSearch Service
- Amazon OpenSearch Serverless
- Data lifecycle in OpenSearch
- OpenSearch request processing
- Threads and queues
- Strategies for sizing your cluster
- Storage
- RAM
- CPU
- Shards and networking
- Completing the examples
- Search
- Logs
- Vectors
- Optimizing OpenSearch clusters for high performance
- Running a POC
- Tenancy
- Shard skew
- Scaling per node resources
- Summary
- PacktPage
- Index
Preface
OpenSearch is a "Swiss Army knife" that touches diverse use cases spanning application features, operations, and generative AI. If there's one unifying theme of the software, it is that it enables storing and retrieving data to support intelligent decision-making. It's a database, but it's a funny kind of database that emphasizes speed and volume processing over consistency. It's a logs store, but a funny kind of logs store that emphasizes aggregations and log-line search. It's a data source for generative AI, but it's a funny kind of data source that brings rich search to the retrieval of information for prompts. In all these cases, OpenSearch provides high-volume request processing and intelligent retrieval of data.
In this book, you'll learn in depth the capabilities of OpenSearch, how and when to apply them, and where you can get the most benefits. You'll also learn about Amazon OpenSearch Service, its managed clusters and serverless deployment options, and how to get the most out of your OpenSearch Service domain or OpenSearch Serverless collection.
We'll begin with introductory chapters that give you a history and overview of OpenSearch and show you how to deploy OpenSearch and how to use OpenSearch Service. We'll then dive deep into OpenSearch's core capabilities-indexing and querying data and building aggregations and visualizations. We'll cover OpenSearch's large collection of plugins that deliver additional features, such as Structured Query Language (SQL), alerting, and k-nearest neighbor search. We'll dive deep into application-building and delivering AI-powered applications with generative AI. We will then move on to operational topics, including migrations, security, monitoring, backups, and recovery. We will round out the book with a deep dive on scaling and performance optimization.
In writing this book, we wanted to distill our years of experience and thousands of hours of customer interaction for you. We wish you every success, and happy OpenSearching!
Who this book is for
This book is for developers, operators, and DevOps engineers who want to add or modernize search for their applications, and who want to monitor those applications for uptime and diagnose and remediate errors. Experience with Amazon Web Services, the Python programming language, Docker, and Kubernetes will be helpful but is not necessary.
What this book covers
Chapter 1, Overview of OpenSearch, covers OpenSearch's history, its core capabilities, and the main use cases for OpenSearch, with real-world examples. It also introduces the topic of operational efficiency.
Chapter 2, Installing and Configuring OpenSearch, gives an overview of OpenSearch distributed system basics. It guides you through deploying OpenSearch via tarball and Docker, and covers OpenSearch Dashboards and the basics of securing your cluster.
Chapter 3, Deployment Options: Amazon OpenSearch Service and Amazon OpenSearch Serverless, guides you through deploying and running OpenSearch in the Amazon Web Services cloud, using Amazon OpenSearch Service, and operational basics such as scaling, storage management, and security.
Chapter 4, Indexing Data, details how to create and maintain OpenSearch indexes, including creating indexes, index settings, setting a mapping, different mapping types, and mapping templates.
Chapter 5, Searching: Core APIs, explains query processing in OpenSearch, leaf queries, hit highlighting, search suggestions, and search templates.
Chapter 6, Advanced Searching, covers OpenSearch's query APIs in depth, as well as compound queries, geospatial queries, faceted search, query percolation, and query performance and profiling.
Chapter 7, Analyze and Visualize OpenSearch Data, dives into aggregations, OpenSearch Dashboards, dashboards and visualizations, working with time-series data such as logs, and the Observability plugin.
Chapter 8, Introduction to OpenSearch Plugins, covers the key OpenSearch plugins, including SQL, alerting, security analytics, k-nearest neighbor, and the Neural plugin. It then details how to install, manage, and build your own plugins for OpenSearch.
Chapter 9, OpenSearch in Action: Making Apps Awesome, moves from the theoretical to the abstract, integrating the topics covered to help you bring the power of OpenSearch to your application with faceted search, auto completions, and connecting to OpenSearch's APIs from your application. It brings everything together in a Streamlit application.
Chapter 10, OpenSearch Vectors and Generative AI, provides a theoretical foundation on dense vectors, sparse vectors, and the large language models that produce them. It goes into depth on exact and approximate k-nearest neighbor search, with the algorithms and engines OpenSearch provides, closing with a generative AI example.
Chapter 11, Migrate to OpenSearch, guides you through why, whether, and how to migrate from other search solutions, including planning for your migration, executing a proof of concept, deploying your target, and moving data and traffic with and without OpenSearch Migration Assistant. It closes with two examples of migrations.
Chapter 12, Security in OpenSearch, explains OpenSearch's security features and guides you in using them to best effect to secure your data and cluster.
Chapter 13, Monitoring, Backup, and Recovery, enters the world of operations to help you use Amazon OpenSearch Service managed clusters efficiently. It covers the metrics that the service generates, how to monitor them, and how best to respond to issues with troubleshooting and backups.
Chapter 14, Scaling and Performance Optimization, explains OpenSearch as a distributed system and walks through the core resources your cluster provides and how OpenSearch maps your workload onto those resources. It finishes with best practices to optimize your cluster infrastructure for maximum efficiency.
To get the most out of this book
Some of the code examples provided are in Python. A working knowledge of the language, and a working Python installation for your system, will allow you to you run those examples.
Some knowledge of distributed systems and other database systems will help you follow the discussion.
Knowledge of Amazon Web Services, Amazon Elastic Compute Cloud, and Docker will enable you to more easily deploy OpenSearch for the examples.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter/X handles. For example: "The _bulk API reduces overhead."
A block of code is set as follows:
POST _bulk { "create": { "_index": "first_index", "_id": "2" } } { "an_integer_field": 23456, "a_string_field": "the quick brown fox"}When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
PUT index_with_mapping { "mappings": { "dynamic": "strict", "properties": { "an_integer_field": { "type": "integer"}, "a_string_field": { "type": "text" } }}}Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: "Select Dev Tools from the left navigation panel."
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book or have any general feedback, please email us at customercare@packt.com and mention the book's title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packt.com/submit-errata, and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.