
Apache Iceberg: The Definitive Guide
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Intro
- Copyright
- Table of Contents
- Foreword by Gerrit Kazmaier
- Foreword by Raghu Ramakrishnan
- Foreword by Rick Sears
- Preface
- About This Book
- Why We Wrote This Book
- What You Will Find Inside
- How to Use This Book
- Feedback and Questions
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Part I. Fundamentals of Apache Iceberg
- Chapter 1. Introduction to Apache Iceberg
- How Did We Get Here? A Brief History
- Foundational Components of a System Designed for OLAP Workloads
- Bringing It All Together
- The Data Warehouse
- A Brief History
- Pros and Cons of a Data Warehouse
- The Data Lake
- A Brief History
- Pros and Cons of a Data Lake
- Should I Run Analytics on a Data Lake or a Data Warehouse?
- The Data Lakehouse
- What Is a Table Format?
- Hive: The Original Table Format
- Modern Data Lake Table Formats
- What Is Apache Iceberg?
- How Apache Iceberg Came to Be
- The Apache Iceberg Architecture
- Key Features of Apache Iceberg
- Conclusion
- Chapter 2. The Architecture of Apache Iceberg
- The Data Layer
- Datafiles
- Delete Files
- The Metadata Layer
- Manifest Files
- Manifest Lists
- Metadata Files
- Puffin Files
- The Catalog
- Conclusion
- Chapter 3. Lifecycle of Write and Read Queries
- Writing Queries in Apache Iceberg
- Create the Table
- Insert the Query
- Merge Query
- Reading Queries in Apache Iceberg
- The SELECT Query
- The Time-Travel Query
- Conclusion
- Chapter 4. Optimizing the Performance of Iceberg Tables
- Compaction
- Hands-on with Compaction
- Compaction Strategies
- Automating Compaction
- Sorting
- Z-order
- Partitioning
- Hidden Partitioning
- Partition Evolution
- Other Partitioning Considerations
- Copy-on-Write Versus Merge-on-Read
- Copy-on-Write
- Merge-on-Read
- Configuring COW and MOR
- Other Considerations
- Metrics Collection
- Rewriting Manifests
- Optimizing Storage
- Write Distribution Mode
- Object Storage Considerations
- Datafile Bloom Filters
- Conclusion
- Chapter 5. Iceberg Catalogs
- Requirements of an Iceberg Catalog
- Catalog Comparison
- The Hadoop Catalog
- The Hive Catalog
- The AWS Glue Catalog
- The Nessie Catalog
- The REST Catalog
- The JDBC Catalog
- Other Catalogs
- Catalog Migration
- Using the Apache Iceberg Catalog Migration CLI
- Using an Engine
- Conclusion
- Part II. Hands-on with Apache Iceberg
- Chapter 6. Apache Spark
- Configuration
- Configuring Apache Iceberg and Spark
- Configuring the Catalogs
- Starting Spark with All the Configurations (AWS Glue Example)
- Data Definition Language Operations
- CREATE TABLE
- ALTER TABLE
- Alter a Table with Iceberg's Spark SQL Extensions
- DROP TABLE
- Reading Data
- The Select All Query
- The Filter Rows Query
- Aggregation Queries
- Using Window Functions
- Writing Data
- INSERT INTO
- MERGE INTO
- INSERT OVERWRITE
- DELETE FROM
- UPDATE
- Iceberg Table Maintenance Procedures
- Expire Snapshots
- Rewrite Datafiles
- Rewrite Manifests
- Remove Orphan Files
- Conclusion
- Chapter 7. Dremio's SQL Query Engine
- Configuration
- Data Definition Language Operations
- CREATE TABLE
- ALTER TABLE
- DROP TABLE
- Reading Data
- Using the SELECT Query
- Filtering Rows
- Using Aggregated Queries
- Using Window Functions
- Writing Data
- INSERT INTO
- COPY INTO
- MERGE INTO
- DELETE
- UPDATE
- Iceberg Table Maintenance
- Expire Snapshots
- Rewrite Datafiles
- Rewrite Manifests
- Conclusion
- Chapter 8. AWS Glue
- Configuration
- Creating a Glue Database
- Configuring the Glue ETL Job
- Create a Table Using the Glue Data Catalog
- Read the Table
- Insert the Data
- Conclusion
- Chapter 9. Apache Flink
- Configuration
- Prerequisites
- Start the Flink Cluster and Flink SQL Client
- Data Definition Language Operations
- CREATE CATALOG
- CREATE DATABASE
- CREATE TABLE
- ALTER TABLE
- DROP TABLE
- Reading Data
- Flink SQL Batch Read
- Flink SQL Streaming Read
- Metadata Table
- Writing Data
- INSERT INTO
- INSERT OVERWRITE
- UPSERT
- Flink DataFrame and Table API with Apache Iceberg Tables
- Prerequisites
- Configuring the Flink Job
- Starting the Cluster and Building the Package
- Running the Job
- Conclusion
- Part III. Apache Iceberg in Practice
- Chapter 10. Apache Iceberg in Production
- Apache Iceberg Metadata Tables
- The history Metadata Table
- The metadata_log_entries Metadata Table
- The snapshots Metadata Table
- The files Metadata Table
- The manifests Metadata Table
- The partitions Metadata Table
- The all_data_files Metadata Table
- The all_manifests Metadata Table
- The refs Metadata Table
- The entries Metadata Table
- Using the Metadata Tables in Conjunction
- Isolation of Changes with Branches
- Table Branching and Tagging
- Catalog Branching and Tagging
- Multitable Transactions
- Rolling Back Changes
- Rolling Back at the Table Level
- Rolling Back at the Catalog Level
- Conclusion
- Chapter 11. Streaming with Apache Iceberg
- Streaming with Spark
- Streaming into Iceberg with Spark
- Streaming from Iceberg with Spark
- Streaming with Flink
- Streaming into Iceberg with Flink
- Example of Streaming into Iceberg with Flink
- Streaming with Kafka Connect
- The Iceberg Kafka Sink
- Streaming with AWS
- Conclusion
- Chapter 12. Governance and Security
- Securing Datafiles
- Securing Files: Best Practices
- Hadoop Distributed File System
- Amazon Simple Storage Service
- Azure Data Lake Storage
- Google Cloud Storage
- Securing and Governing at the Semantic Layer
- Semantic Layer Best Practices
- Dremio
- Trino
- Securing and Governing at the Catalog Level
- Nessie
- Tabular
- AWS Glue and Lake Formation
- Additional Security and Governance Considerations
- Conclusion
- Chapter 13. Migrating to Apache Iceberg
- Migration Considerations
- Three-Step In-Place Migration Plan
- Four-Phase Shadow Migration Plan
- Migrating Hive Tables to Apache Iceberg
- The Snapshot Procedure
- The Migrate Procedure
- Migrating Delta Lake to Apache Iceberg
- Migrating Apache Hudi to Apache Iceberg
- Migrating Individual Files to Apache Iceberg
- Using the add_files Procedure
- Migrating from Delta Lake or Apache Hudi Without Preserving History
- Migrating from Anywhere by Rewriting Data
- Migrating Data to a New Iceberg Table
- Migrating Data into an Existing Iceberg Table
- Conclusion
- Chapter 14. Real-World Use Cases of Apache Iceberg
- Ensuring High-Quality Data with Write-Audit-Publish in Apache Iceberg
- WAP Using Iceberg's Branching Feature
- Running BI Workloads on the Data Lake
- Land the Raw Data into the Data Lake
- Curate Virtual Data Marts/Data Products
- Create a Reflection to Accelerate Our Dashboard
- Connect Our View to Our BI Tool
- Benefits of Running BI Workloads on the Data Lake
- Implementing Change Data Capture with Apache Iceberg
- Create Apache Iceberg Tables
- Apply Updates from Operational Systems
- Create the Change Log View to Capture Changes
- Merge Changed Data in the Aggregated Table
- Conclusion
- Index
- About the Authors
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.