Getting Started with Impala

Name: Getting Started with Impala | Interactive SQL for Apache Hadoop
Brand: O'Reilly
Price: 26.99 EUR
Availability: OnlineOnly

Interactive SQL for Apache Hadoop

John Russell(Author)

O'Reilly (Publisher)

1st Edition

Published on 25. September 2014

110 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-4919-0574-6 (ISBN)

€26.99incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Copyright
Table of Contents
Introduction
Who Is This Book For?
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Content Updates
March 30, 2016
Acknowledgments
Chapter 1. Why Impala?
Impala's Place in the Big Data Ecosystem
Flexibility for Your Big Data Workflow
High-Performance Analytics
Exploratory Business Intelligence
Chapter 2. Getting Up and Running with Impala
Installation
Connecting to Impala
Your First Impala Queries
Chapter 3. Impala for the Database Developer
The SQL Language
Standard SQL for Queries
Limited DML
No Transactions
Numbers
Recent Additions
Big Data Considerations
Billions and Billions of Rows
HDFS Block Size
Parquet Files: The Biggest Blocks of All
How Impala Is Like a Data Warehouse
Physical and Logical Data Layouts
The HDFS Storage Model
Distributed Queries
Normalized and Denormalized Data
File Formats
Text File Format
Parquet File Format
Getting File Format Information
Switching File Formats
Aggregation
Chapter 4. Common Developer Tasks for Impala
Getting Data into an Impala Table
INSERT Statement
LOAD DATA Statement
External Tables
Figuring Out Where Impala Data Resides
Manually Loading Data Files into HDFS
Hive
Sqoop
Kite
Porting SQL Code to Impala
Using Impala from a JDBC or ODBC Application
JDBC
ODBC
Using Impala with a Scripting Language
Running Impala SQL Statements from Scripts
Variable Substitution
Saving Query Results
The impyla Package for Python Scripting
Optimizing Impala Performance
Optimizing Query Performance
Optimizing Memory Usage
Working with Partitioned Tables
Finding the Ideal Granularity
Inserting into Partitioned Tables
Adding and Loading New Partitions
Keeping Statistics Up to Date for Partitioned Tables
Writing User-Defined Functions
Collaborating with Your Administrators
Designing for Security
Anticipate Memory Usage
Understanding Resource Management
Helping to Plan for Performance (Stats, HDFS Caching)
Understanding Cluster Topology
Always Close Your Queries
Chapter 5. Tutorials and Deep Dives
Tutorial: From Unix Data File to Impala Table
Tutorial: Queries Without a Table
Tutorial: The Journey of a Billion Rows
Generating a Billion Rows of CSV Data
Normalizing the Original Data
Converting to Parquet Format
Making a Partitioned Table
Next Steps
Deep Dive: Joins and the Role of Statistics
Creating a Million-Row Table to Join With
Loading Data and Computing Stats
Reviewing the EXPLAIN Plan
Trying a Real Query
The Story So Far
Final Join Query with 1B x 1M Rows
Anti-Pattern: A Million Little Pieces
Tutorial: Across the Fourth Dimension
TIMESTAMP Data Type
Format Strings for Dates and Times
Working with Individual Date and Time Fields
Date and Time Arithmetic
Let's Solve the Y2K Problem
More Fun with Dates
Tutorial: Verbose and Quiet impala-shell Output
Tutorial: When Schemas Evolve
Numbers Versus Strings
Dealing with Out-of-Range Integers
Tutorial: Levels of Abstraction
String Formatting
Temperature Conversion
Tutorial: Subqueries
Subqueries in the FROM Clause
Subqueries in the FROM Clause for Join Queries
Subqueries in the WHERE Clause
Uncorrelated and Correlated Subqueries
Common Table Expressions in the WITH Clause
Tutorial: Analytic Functions
Analyzing the Numbers 1 Through 10
Running Totals and Moving Averages
Breaking Ties
Tutorial: Complex Types
ARRAY: A List of Items with Identical Types
MAP: A Hash Table or Dictionary with Key-Value Pairs
STRUCT: A Row-Like Object for Flexible Typing and Naming
Nesting Complex Types to Represent Arbitrary Data Structures
Querying Tables with Nested Complex Types
Constructing Data for Complex Types
About the Author
Colophon

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Getting Started with Impala

Description

More details

Other editions

Additional editions

Content

System requirements