Presto: The Definitive Guide

SQL at Any Scale, on Any Storage, in Any Environment
 
 
O'Reilly (Verlag)
  • erschienen am 3. April 2020
  • |
  • 310 Seiten
 
E-Book | ePUB mit Adobe-DRM | Systemvoraussetzungen
978-1-4920-4422-2 (ISBN)
 
Perform fast interactive analytics against different data sources using the Presto high-performance, distributed SQL query engine. With this practical guide, youll learn how to conduct analytics on data where it lives, whether its Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Presto.Initially developed by Facebook, open source Presto is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso from Starburst show you how a single Presto query can combine data from multiple sources to allow for analytics across your entire organization.Get started: Explore Prestos use cases and learn about tools that will help you connect to Presto and query dataGo deeper: Learn Prestos internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and morePut Presto in production: Use this query engine for security and monitoring and with other applications; learn how other organizations apply Presto
  • Englisch
  • Sebastopol
  • |
  • USA
  • 4,72 MB
978-1-4920-4422-2 (9781492044222)
weitere Ausgaben werden ermittelt
  • Intro
  • Foreword
  • Preface
  • About the Book
  • Conventions Used in This Book
  • Code Examples, Permissions, and Attribution
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • I. Getting Started with Presto
  • 1. Introducing Presto
  • The Problems with Big Data
  • Presto to the Rescue
  • Designed for Performance and Scale
  • SQL-on-Anything
  • Separation of Data Storage and Query Compute Resources
  • Presto Use Cases
  • One SQL Analytics Access Point
  • Access Point to Data Warehouse and Source Systems
  • Provide SQL-Based Access to Anything
  • Federated Queries
  • Semantic Layer for a Virtual Data Warehouse
  • Data Lake Query Engine
  • SQL Conversions and ETL
  • Better Insights Due to Faster Response Times
  • Big Data, Machine Learning, and Artificial Intelligence
  • Other Use Cases
  • Presto Resources
  • Website
  • Documentation
  • Community Chat
  • Source Code, License, and Version
  • Contributing
  • Book Repository
  • Iris Data Set
  • Flight Data Set
  • A Brief History of Presto
  • Conclusion
  • 2. Installing and Configuring Presto
  • Trying Presto with the Docker Container
  • Installing from Archive File
  • Java Virtual Machine
  • Python
  • Installation
  • Configuration
  • Adding a Data Source
  • Running Presto
  • Conclusion
  • 3. Using Presto
  • Presto Command-Line Interface
  • Getting Started
  • Pagination
  • History
  • Additional Diagnostics
  • Executing Queries
  • Output Formats
  • Ignoring Errors
  • Presto JDBC Driver
  • Downloading and Registering the Driver
  • Establishing a Connection to Presto
  • Presto and ODBC
  • Client Libraries
  • Presto Web UI
  • SQL with Presto
  • Concepts
  • First Examples
  • Conclusion
  • II. Diving Deeper into Presto
  • 4. Presto Architecture
  • Coordinator and Workers in a Cluster
  • Coordinator
  • Discovery Service
  • Workers
  • Connector-Based Architecture
  • Catalogs, Schemas, and Tables
  • Query Execution Model
  • Query Planning
  • Parsing and Analysis
  • Initial Query Planning
  • Optimization Rules
  • Predicate Pushdown
  • Cross Join Elimination
  • TopN
  • Partial Aggregations
  • Implementation Rules
  • Lateral Join Decorrelation
  • Semi-Join (IN) Decorrelation
  • Cost-Based Optimizer
  • The Cost Concept
  • Cost of the Join
  • Table Statistics
  • Filter Statistics
  • Table Statistics for Partitioned Tables
  • Join Enumeration
  • Broadcast Versus Distributed Joins
  • Broadcast join strategy
  • Distributed join strategy
  • Working with Table Statistics
  • Presto ANALYZE
  • Gathering Statistics When Writing to Disk
  • Hive ANALYZE
  • Displaying Table Statistics
  • Conclusion
  • 5. Production-Ready Deployment
  • Configuration Details
  • Server Configuration
  • Logging
  • Node Configuration
  • JVM Configuration
  • Launcher
  • Cluster Installation
  • RPM Installation
  • Installation Directory Structure
  • Configuration
  • Uninstall Presto
  • Installation in the Cloud
  • Cluster Sizing Considerations
  • Conclusion
  • 6. Connectors
  • Configuration
  • RDBMS Connector Example PostgreSQL
  • Query Pushdown
  • Parallelism and Concurrency
  • Other RDBMS Connectors
  • Security
  • Presto TPC-H and TPC-DS Connectors
  • Hive Connector for Distributed Storage Data Sources
  • Apache Hadoop and Hive
  • Hive Connector
  • Hive-Style Table Format
  • Managed and External Tables
  • Partitioned Data
  • Loading Data
  • File Formats and Compression
  • MinIO Example
  • Non-Relational Data Sources
  • Presto JMX Connector
  • Black Hole Connector
  • Memory Connector
  • Other Connectors
  • Conclusion
  • 7. Advanced Connector Examples
  • Connecting to HBase with Phoenix
  • Key-Value Store Connector Example: Accumulo
  • Using the Presto Accumulo Connector
  • Predicate Pushdown in Accumulo
  • Apache Cassandra Connector
  • Streaming System Connector Example: Kafka
  • Document Store Connector Example: Elasticsearch
  • Overview
  • Configuration and Usage
  • Query Processing
  • Full-Text Search
  • Summary
  • Query Federation in Presto
  • Extract, Transform, Load and Federated Queries
  • Conclusion
  • 8. Using SQL in Presto
  • Presto Statements
  • Presto System Tables
  • Catalogs
  • Schemas
  • Information Schema
  • Tables
  • Table and Column Properties
  • Copying an Existing Table
  • Creating a New Table from Query Results
  • Modifying a Table
  • Deleting a Table
  • Table Limitations from Connectors
  • Views
  • Session Information and Configuration
  • Data Types
  • Collection Data Types
  • Temporal Data Types
  • Time Zones
  • Intervals
  • Type Casting
  • SELECT Statement Basics
  • WHERE Clause
  • GROUP BY and HAVING Clauses
  • ORDER BY and LIMIT Clauses
  • JOIN Statements
  • UNION, INTERSECT, and EXCEPT Clauses
  • Grouping Operations
  • WITH Clause
  • Subqueries
  • Scalar Subquery
  • EXISTS Subquery
  • Quantified Subquery
  • Deleting Data from a Table
  • Conclusion
  • 9. Advanced SQL
  • Functions and Operators Introduction
  • Scalar Functions and Operators
  • Boolean Operators
  • Logical Operators
  • Range Selection with the BETWEEN Statement
  • Value Detection with IS (NOT) NULL
  • Mathematical Functions and Operators
  • Trigonometric Functions
  • Constant and Random Functions
  • String Functions and Operators
  • Strings and Maps
  • Unicode
  • Regular Expressions
  • Unnesting Complex Data Types
  • JSON Functions
  • Date and Time Functions and Operators
  • Histograms
  • Aggregate Functions
  • Map Aggregate Functions
  • Approximate Aggregate Functions
  • Window Functions
  • Lambda Expressions
  • Geospatial Functions
  • Prepared Statements
  • Conclusion
  • III. Presto in Real-World Uses
  • 10. Security
  • Authentication
  • Password and LDAP Authentication
  • Authorization
  • System Access Control
  • Connector Access Control
  • Encryption
  • Encrypting Presto Client-to-Coordinator Communication
  • Creating Java Keystores and Java Truststores
  • Encrypting Communication Within the Presto Cluster
  • Certificate Authority Versus Self-Signed Certificates
  • Certificate Authentication
  • Kerberos
  • Prerequisites
  • Kerberos Client Authentication
  • Cluster Internal Kerberos
  • Data Source Access and Configuration for Security
  • Kerberos Authentication with the Hive Connector
  • Hive Metastore Thrift Service Authentication
  • HDFS Authentication
  • Cluster Separation
  • Conclusion
  • 11. Integrating Presto with Other Tools
  • Queries, Visualizations, and More with Apache Superset
  • Performance Improvements with RubiX
  • Workflows with Apache Airflow
  • Embedded Presto Example: Amazon Athena
  • Starburst Enterprise Presto
  • Other Integration Examples
  • Custom Integrations
  • Conclusion
  • 12. Presto in Production
  • Monitoring with the Presto Web UI
  • Cluster-Level Details
  • Query List
  • Query Details View
  • Overview
  • Live Plan
  • Stage Performance
  • Splits
  • JSON
  • Tuning Presto SQL Queries
  • Memory Management
  • Task Concurrency
  • Worker Scheduling
  • Scheduling Splits per Task and per Node
  • Local Scheduling
  • Network Data Exchange
  • Concurrency
  • Buffer Sizes
  • Tuning Java Virtual Machine
  • Resource Groups
  • Resource Group Definition
  • Scheduling Policy
  • Selector Rules Definition
  • Conclusion
  • 13. Real-World Examples
  • Deployment and Runtime Platforms
  • Cluster Sizing
  • Hadoop/Hive Migration Use Case
  • Other Data Sources
  • Users and Traffic
  • Conclusion
  • 14. Conclusion
  • Index

Dateiformat: ePUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Bitte beachten Sie bei der Verwendung der Lese-Software Adobe Digital Editions: wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

58,49 €
inkl. 5% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe-DRM
siehe Systemvoraussetzungen
E-Book bestellen