
Querying Databricks with Spark SQL
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Intro
- Cover
- Title Page
- Copyright Page
- About the Author
- About the Reviewer
- Acknowledgement
- Preface
- Table of Contents
- 1. Writing Basic SQL Queries
- Introduction
- Structure
- Objectives
- Databricks
- The Databricks Web interface
- Getting started with Databricks notebooks
- Cells
- Spark SQL
- Databricks databases (schemas) and tables
- Activating a database
- Displaying the data in a table
- How does it work
- Help writing SQL
- Limiting the number of records displayed
- How does it work
- Displaying data from a specific field
- How does it work
- Finding the columns in a table
- SQL writing style
- Displaying data from a specific set of fields
- How does it work
- Columns or fields
- Modifying the field name in the output using aliases
- How does it work
- Removing duplicates from query output
- How does it work
- Undo and redo
- Sorting data
- How does it work
- Sorting data in reverse order
- How does it work
- Applying multiple sort criteria
- How does it work
- Running SQL queries
- Displaying the available tables
- Finding all the views in a database
- Resizing book cells
- Overriding the 1,000-row output limit
- Conclusion
- 2. Filtering Data
- Introduction
- Structure
- Objectives
- Filtering text
- How does it work
- Applying multiple text filters
- How does it work
- SQL writing style
- Excluding an element
- How does it work
- Using multiple exclusion filters
- How does it work
- Filtering numbers over a defined threshold
- How does it work
- Filtering numbers under a defined threshold
- How does it work
- Filtering on values up to and including a specific number
- How does it work
- Filtering on a range of values
- How does it work
- Using Boolean filters (true or false)
- How does it work
- Conclusion
- 3. Applying Complex Filters to Queries
- Introduction
- Structure
- Objectives
- Using either/or filters
- How does it work
- Using multiple separate criteria concurrently
- How does it work
- Using multiple filters and an exclusion
- How does it work
- Filtering on both text and numbers simultaneously
- How does it work
- Applying complex alternative filters at the same time
- How does it work
- Removing case-sensitivity in filters
- How does it work
- Using wildcard searches
- How does it work
- Forcing case-insensitivity in wildcard filters
- How does it work
- Using wildcards to exclude data
- How does it work
- Applying alternative wildcard patterns
- How does it work
- Using a specific part of the text to filter data
- How does it work
- Filtering NULLs or nonexistent data
- How does it work
- Searching using regular expressions
- How does it work
- Conclusion
- 4. Simple Calculations
- Introduction
- Structure
- Objectives
- Doing simple math
- How does it work
- Examining data types in SQL tables and views
- Isolating sections of formulas when applying math
- How does it work
- Calculating ratios
- How does it work
- Increasing values by a defined percentage
- How does it work
- Ordering the output of calculations
- How does it work
- NULLs
- Handling missing data
- How does it work
- Filtering on a calculation
- How does it work
- Using complex calculated filters
- How does it work
- Operator precedence
- Conclusion
- 5. Aggregating Output
- Introduction
- Structure
- Objectives
- Calculating table totals
- How does it work
- Using calculated aggregations
- How does it work
- Using grouped aggregations
- How does it work
- Using multiple levels of grouping
- How does it work
- Calculating averages
- How does it work
- Counting grouped elements
- How does it work
- Counting unique elements
- How does it work
- Displaying upper and lower numeric thresholds
- How does it work
- Aggregating across multiple columns
- How does it work
- Aggregating across columns and rows
- How does it work
- Filtering groups
- How does it work
- Filtering on aggregated results
- How does it work
- Selecting data based on aggregated results as well as specific filter criteria
- How does it work
- Query analysis
- Sorting by aggregated results
- How does it work
- Finding elements where every Boolean value is true
- How does it work
- Conclusion
- 6. Working with Dates in Databricks
- Introduction
- Structure
- Objectives
- Dates in Databricks
- Filtering records by date
- How does it work
- Date datatypes in tables
- Using a range of dates to filter data
- How does it work
- Finding the number of days between two dates
- How does it work
- Aggregating data over a date range
- How does it work
- Filtering by year
- How does it work
- Filtering records over a series of years
- How does it work
- Find sales for a specific day of the month
- How does it work
- Isolating data for a specific year and month
- How does it work
- Finding data for a given quarter
- How does it work
- Filtering data by weekday
- How does it work
- Finding records for a specific week of the year
- How does it work
- Aggregating data by the day of the week in a given year
- How does it work
- Analyzing data by day of year
- How does it work
- Grouping data by the full weekday
- How does it work
- Displaying cumulative data over a period of months to a specific date
- How does it work
- Displaying cumulative data over 90 days up to a specific date
- How does it works
- Displaying the data for the previous three months
- How does it work
- Finding the current system date
- Conclusion
- 7. Formatting Text in Query Output
- Introduction
- Structure
- Objectives
- Adding text to the output
- How does it work
- Adding text to numbers
- How does it work
- Amalgamating columns
- How does it work
- Avoiding NULLs in text-based data
- How does it work
- Concatenating and grouping
- How does it work
- Adding multiple pieces of text to numbers
- How does it work
- Converting text to uppercase
- How does it work
- Converting text to lowercase
- How does it work
- Converting text to initial capitals
- Extracting the first few characters from a field
- How does it work
- Displaying the three characters at the right of the text
- How does it work
- Displaying a given number of characters at a specific place in the text
- How does it work
- Replacing NULLs with the contents of another field
- How does it work
- Filtering records based on the part of a field
- How does it work
- Filtering data using specific characters at a given position inside a field
- How does it work
- Conclusion
- 8. Formatting Numbers and Dates
- Introduction
- Structure
- Objectives
- Removing the decimals from the output
- How does it work
- Rounding a field up to the nearest whole number
- How does it work
- Rounding a value to the nearest whole number
- How does it work
- Rounding to a specific number of decimals
- Rounding a value up or down to the nearest thousand
- How does it work
- Banker's rounding
- Displaying a value in a specific numeric format
- How does it work
- Displaying a value in a specific currency
- How does it work
- Outputting a date in a specific date format
- How does it work
- Presenting the time in a specific format
- Conclusion
- 9. Using Basic Logic to Enhance Analysis
- Introduction
- Structure
- Objectives
- Generating an alert when a value is too high
- How does it work
- Shortening text and adding ellipses to indicate truncation
- How does it work
- Designing complex calculated alerts
- How does it work
- Creating key performance indicators
- How does it work
- Classifying a series of elements without the necessary categories present in your data
- How does it work
- Creating ad hoc category groupings
- How does it work
- Applying multiple ad hoc categories
- How does it work
- Categorizing data using nested classifications
- How does it work
- Choosing elements from a list
- How does it work
- Placing nulls at the start or end of a list
- How does it work
- Conclusion
- 10. Using Multiple Tables When Querying Data
- Introduction
- Structure
- Objectives
- Joining tables
- How does it work
- Other join syntax
- Joining multiple tables
- How does it work
- Join fields
- Join subtleties
- How does it work
- Using table aliases
- How does it work
- Joining many tables
- How does it work
- Visualizing databases
- Querying across databases
- Conclusion
- 11. Using Advanced Table Joins
- Introduction
- Structure
- Objectives
- Filtering data using inner joins
- Filtering data using multiple tables join
- How does it work
- Semi joins
- How does it work
- Filtering data output using intermediate tables
- How does it work
- Using left joins to return all the data in one table but not from the other table
- How does it work
- Right joins to return all the data in one table but not from the other
- How does it work
- Full joins to return all the data from both tables in a join
- How does it work
- Intermediate table joins
- How does it work
- Using multiple fields in joins
- How does it work
- Joining a table to itself
- How does it work
- Joining tables on ranges of values
- How does it work
- Cross joins
- How does it work
- Join concepts
- Conclusion
- 12. Subqueries
- Introduction
- Structure
- Objectives
- Adding aggregated fields to detailed datasets
- How does it work
- Displaying a value as the percentage of a total
- How does it work
- Using a subquery to filter data
- How does it work
- Using a subquery as part of a calculation to filter data
- How does it work
- Filtering on an aggregated range of data using multiple subqueries
- How does it work
- Filtering on aggregated output using a second aggregation
- How does it work
- Using multiple results from a subquery to filter data
- How does it work
- Complex aggregated subqueries
- How does it work
- Nested subqueries
- How does it work
- Using subqueries to exclude data
- How does it work
- Filtering across queries and subqueries
- How does it work
- Applying separate filters to the Subquery and the main query
- How does it work
- Conclusion
- 13. Derived Tables
- Introduction
- Structure
- Objectives
- Using a derived table to create intermediate calculations
- How does it work
- Grouping and ordering data using a custom classification
- How does it work
- Joining derived tables with other tables
- How does it work
- Joining multiple derived tables
- How does it work
- Using multiple derived tables for complex aggregations
- How does it work
- Data visibility
- Using derived tables to join unconnected tables
- How does it work
- Compare year-on-year data using a derived table
- How does it work
- Synchronizing filters between a derived table and the main query
- How does it work
- Conclusion
- 14. Common Table Expressions
- Introduction
- Structure
- Objectives
- Simplifying complex queries with CTEs
- A basic common table expression
- How does it work
- Calculating averages across multiple values using a CTE
- How does it work
- CTE or derived table?
- Reusing CTEs in a query
- How does it work
- Using a CTE in a derived table to deliver two different levels of aggregation
- How does it work
- Using a CTE to isolate data from a separate dataset at a different level of detail
- How does it work
- Multiple common table expressions in a single query
- How does it work
- Nested CTEs
- How does it work
- Using multiple CTEs to compare disparate datasets
- How does it work
- Conclusion
- 15. Correlated Subqueries
- Introduction
- Structure
- Objectives
- Simple correlated subqueries
- How does it work
- Correlated subqueries to display percentages of a specific total
- How does it work
- Comparing datasets using a correlated subquery
- How does it work
- Avoid correlated subqueries in certain cases
- Duplicating the output of a correlated subquery in the query results
- How does it work
- Using correlated subqueries to filter data on an aggregate value
- How does it work
- What makes a query correlated
- Using correlated subqueries to detect if records exist
- How does it work
- Using a correlated subquery to exclude data
- How does it work
- Conclusion
- 16. Dataset Manipulation
- Introduction
- Structure
- Objectives
- Read data from multiple identical tables using the UNION operator
- How does it work
- Isolate identical data in multiple tables using the INTERSECT operator
- How does it work
- Isolating nonidentical records using the EXCEPT operator
- How does it work
- Joining multiple identical tables in a subquery
- How does it work
- Conclusion
- 17. Using SQL for More Advanced Calculations
- Introduction
- Structure
- Objectives
- Calculating the percentage represented by each record in a dataset
- How does it work
- Replacing multiple subqueries
- How does it work
- Remove decimals in calculations
- How does it work
- Numeric datatypes
- Converting between numeric datatypes
- How does it work
- Avoiding divide-by-zero errors
- How does it work
- Finding the remainder in a division using the modulo function
- How does it work
- Creating financial calculations
- How does it work
- Using a tally table to produce a sequential list of numbers
- How does it work
- Generating completely random sample output from a dataset
- How does it work
- Handling source data where figures are stored as text
- How does it work
- Conclusion
- 18. Segmenting and Classifying Data
- Introduction
- Structure
- Objectives
- Organizing data by rank
- How does it work
- Creating multiple groups of rankings
- How does it work
- Creating multiple ranked groups and subgroups
- How does it work
- Filtering data by ranked items
- How does it work
- Classifying data by strict order of rank
- How does it work
- Segment data into deciles
- How does it work
- Plot values for a percentile
- How does it work
- Extracting data from a specific quintile
- How does it work
- Display median values
- How does it work
- Conclusion
- 19. Rolling Analysis
- Introduction
- Structure
- Objectives
- Adding a running total
- How does it work
- Using windowing functions in an aggregated query
- How does it work
- Grouping running totals
- How does it work
- Applying windowing functions to a subquery
- How does it work
- Adding unique ids on the fly using ROW_NUMBER()
- How does it work
- Displaying records for missing data
- How does it work
- Displaying a complete range of dates and relevant data
- How does it work
- Comparing data with the data from a previous record
- How does it work
- Comparing data over time using the FIRST_VALUE() and LAST_VALUE() functions
- How does it work
- Displaying rolling averages over a specified number of records
- How does it work
- Show the first sale and last four sales per client
- How does it work
- Calculating cumulative distribution
- How does it work
- Classifying data using the PERCENT_RANK() function
- How does it work
- Using the LAG() function with alphabetical data
- How does it work
- Conclusion
- 20. Analyzing Data Over Time
- Introduction
- Structure
- Objectives
- Aggregating values for the year to date
- How does it work
- Isolating data for the previous month
- How does it work
- Using a derived table to compare data with values from a previous year
- How does it work
- Finding the total amount for sales each weekday over a year
- How does it work
- Count the number of weekend days between two dates
- How does it work
- Aggregate data for the last day of the month
- How does it work
- Aggregate data for the last Friday of the month
- How does it work
- Analyzing timespans as years, months, and days
- How does it work
- Isolate time periods from date and time data
- How does it work
- Listing data by time of day
- How does it work
- Aggregating data by hourly bandings
- How does it work
- Aggregate data by a quarter of an hour
- How does it work
- Reading dates and times stored as strings
- How does it work
- Conclusion
- 21. Complex Data Output
- Introduction
- Structure
- Objectives
- Creating a pivot table
- How does it work
- Creating a pivot table displaying multiple row groupings
- How does it work
- Adding totals to aggregate queries
- How does it work
- Creating subtotals and totals in aggregated queries
- How does it work
- Creating clear tables that include totals and subtotals
- How does it work
- Replace acronyms with full text in the final output
- How does it work
- Conclusion
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.