
SQL for Data Analytics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Work with time-series, geospatial, and text data using PostgreSQL
- Build job-ready data analysis skills with hands-on SQL projects
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionSQL remains one of the most essential tools for modern data analysis and mastering it can set you apart in a competitive data landscape. This book helps you go beyond basic query writing to develop a deep, practical understanding of how SQL powers real-world decision-making. SQL for Data Analytics, Fourth Edition, is for anyone who wants to go beyond basic SQL syntax and confidently analyze real-world data. Whether you're trying to make sense of production data for the first time or upgrading your analytics toolkit, this book gives you the skills to turn data into actionable outcomes. You'll start by creating and managing structured databases before advancing to data retrieval, transformation, and summarization. From there, you'll take on more complex tasks such as window functions, statistical operations, and analyzing geospatial, time-series, and text data. With hands-on exercises, case studies, and detailed guidance throughout, this book prepares you to apply SQL in everyday business contexts, whether you're cleaning data, building dashboards, or presenting findings to stakeholders. By the end, you'll have a powerful SQL toolkit that translates directly to the work analysts do every day. *Email sign-up and proof of purchase requiredWhat you will learn - Write SQL Queries to explore and analyze structured data.
- Use JOINs, subqueries, views, and CTEs to build analytics-ready datasets
- Apply window functions to identify trends, patterns, and cohort behavior
- Perform statistical analysis and hypothesis testing directly in SQL
- Analyze JSON, arrays, text, geospatial, and time-series data
- Improve SQL performance with indexing strategies and query plan optimization
- Load data with Python and automate analytics workflows
- Complete a full case study simulating a real-world data analysis project
Who this book is forThis book is for aspiring and early-career data analysts, data engineers, backend developers, business analysts, and students who want to apply SQL to real-world data analytics. You should have basic SQL familiarity and college-level math knowledge, along with the desire to advance toward analytics-grade SQL, data transformation, pattern discovery, and business insight generation.
All prices
More details
Other editions
New editions

Content
- Cover
- FM
- Contributors
- Preface
- Free Benefits with Your Book
- Part 1: Data Management Systems
- Chapter 1: Introduction to Data Management Systems
- Technical requirements
- Describing the world with data
- Data modeling
- Understanding relational databases and SQL
- Primary keys and foreign keys
- Normalization
- Advantages and disadvantages of SQL databases
- Setting up a PostgreSQL relational database
- Exercise 1.1: Installing PostgreSQL on your local machine
- Exercise 1.2: Accessing and understanding PostgreSQL
- Exercise 1.3: Utilizing PostgreSQL query tools
- Exercise 1.4: Import a sample sqlda database
- Introducing the sqlda database
- Activity 1
- Summary
- Chapter 2: Creating Tables with Solid Structures
- Technical requirements
- Running CRUD with SQL
- CREATE
- READ
- UPDATE
- DELETE
- Creating a table from an existing dataset
- Describing columns
- Learning about basic data types of SQL
- Numeric and monetary
- Character
- Boolean
- Datetime
- More data types
- Creating a table with an explicit definition
- Column constraints and table constraints
- Exercise 2.1: Creating and populating tables
- Inserting data into a table
- Exercise 2.2: Populating a table
- Deleting/dropping tables
- Exercise 2.3: Deleting an unnecessary table
- Activity 2
- Summary
- Chapter 3: Exchanging Data Using COPY
- Technical requirements
- Exporting data from a PostgreSQL database
- \COPY in psql
- Configuring COPY and \COPY
- Importing data into a PostgreSQL database
- Exercise 3.1: Exporting data to a file for further processing in Excel
- Activity 3
- Summary
- Get This Book's PDF Version and Exclusive Extras
- Chapter 4: Manipulating Data with Python
- Technical requirements
- Getting started with Python
- Exercise 4.1: Setting up Python on your machine
- Managing data with Python
- What is SQLAlchemy?
- Using Python with SQLAlchemy and pandas
- Reading and writing to a database with pandas
- Writing data to the database from Python
- Exercise 4.2: Reading, visualizing, and saving data in Python
- Activity 4
- Summary
- Part 2: Data Presentation and Manipulation
- Chapter 5: Presenting Data with SELECT
- Technical requirements
- Using SELECT expressions
- Expression alias
- The LIMIT clause
- The ORDER BY clause
- The DISTINCT and DISTINCT ON functions
- Filtering query results
- The AND/OR and NOT clauses
- The IN/NOT IN clause
- The IS NULL/IS NOT NULL clauses
- Exercise 5.1: Reading data from the database
- Activity 5
- Summary
- Chapter 6: Transforming and Updating Data
- Technical requirements
- Updating table data
- Cleaning data
- Exercise 6.1: Updating and deleting data
- Running data transformation functions
- The CASE WHEN function
- Functions for different data types
- The datetime function
- The string function
- The casting function
- The NULL handling functions
- The COALESCE function
- The NULLIF function
- Exercise 6.2: Data manipulation using functions
- Creating user-defined functions
- The \df and \sf commands
- Exercise 6.3: Creating functions with arguments
- Triggers
- Changing the table definition
- Activity 6
- Summary
- Chapter 7: Defining Datasets from Existing Datasets
- Technical requirements
- Creating derived datasets
- Common table expressions
- Views
- Exercise 7.1: Utilizing subqueries
- Joining tables
- Inner joins
- Outer joins
- Left outer joins
- Right outer joins
- Full outer joins
- Cross joins
- Exercise 7.2: Using joins to analyze a sales dealership
- Running set operations
- Exercise 7.3: Generating an elite customer party guest list using UNION
- Activity 7
- Summary
- Get This Book's PDF Version and Exclusive Extras
- Chapter 8: Aggregating Data with GROUP BY
- Technical requirements
- Aggregating data
- Exercise 8.1: Using aggregate functions to analyze data
- Aggregating with GROUP BY clause
- The GROUP BY clause
- Exercise 8.2: Calculating the cost by product type using GROUP BY
- Grouping sets
- Ordered set aggregates
- Applying the HAVING clause
- Exercise 8.3: Calculating and displaying data using the HAVING clause
- Activity 8
- Summary
- Chapter 9: Inter-Row Operation with Window Functions
- Technical requirements
- Defining window functions
- The basics of window functions
- Exercise 9.1: Analyzing Customer Data Fill Rates over Time
- Using advanced window definitions
- Common window functions
- The WINDOW keyword
- Window frame
- Exercise 9.2: Team Lunch Motivation
- Activity 9
- Summary
- Get This Book's PDF Version and Exclusive Extras
- Part 3: Advanced Topics on Analytics
- Chapter 10: Performant SQL
- Technical requirements
- Scanning the database
- Query planning
- Exercise 10.1: Interpreting the query planner
- Scanning the index
- The B-tree index
- Exercise 10.2: Creating an index scan
- The hash index
- Exercise 10.3: Generating hash indexes to investigate performance
- Effective index use
- Activity 10
- Summary
- Chapter 11: Processing JSON and Arrays
- Technical requirements
- Understanding types of data
- Using JSON
- JSONB: Pre-parsed JSON
- Accessing data from a JSON or JSONB field
- Leveraging JSON path
- Creating and modifying data in a JSONB field
- Exercise 11.1: Searching through JSONB
- Using arrays to process element collections
- Exercise 11.2: Analyzing sequences using arrays
- Activity 11
- Summary
- Get This Book's PDF Version and Exclusive Extras
- Chapter 12: Advanced Data Types: Date, Text, and Geospatial
- Technical requirements
- Using date and time in data analytics
- The DATE type
- Transforming DATE data types
- Intervals
- Exercise 12.1: Analytics with time-series data
- Understanding text processing
- String characteristics and manipulation
- Identifying string patterns
- Exercise 12.2: Text processing
- Applying geospatial data
- Latitude and longitude
- Exercise 12.3: Geospatial analysis
- Activity 12
- Summary
- Chapter 13: Inferential Statistics Using SQL
- Technical requirements
- Moving from analytics to statistics
- Understanding fundamental concepts: population versus samples, parameters versus statistics
- Population versus samples
- Parameters versus statistics
- Estimating: point estimates and confidence intervals
- Testing hypotheses
- Analyzing correlation and performing regression
- Interpreting regression results
- Understanding a simple linear regression example
- Activity 13
- Summary
- Get This Book's PDF Version and Exclusive Extras
- Chapter 14: A Case Study for Analytics Using SQL
- Technical requirements
- Understanding the data analytics system
- Understanding dimensional models
- Understanding data warehouse architecture
- Applying data analysis using SQL
- Exercise 14.1: Copying from a file into the staging table
- Exercise 14.2: Quality check for raw data
- Exercise 14.3: Loading data into the star schema
- Exercise 14.4: Delivery data for analysis
- Summary
- Chapter 15: Unlock Your Exclusive Benefits
- Unlock this Book's Free Benefits in 3 Easy Steps
- Need help?
- Other Books You May Enjoy
- Index
Preface
The Structured Query Language, better known as SQL, has been the main workhorse for data professionals over the past five decades. It is one of the most sought-after skills in the job market and is frequently tested in various job interviews. It is widely taught in colleges around the world, from computer science majors to business/statistics schools, and from undergraduate studies to doctoral programs. Yet, as popular as it is, there is a common complaint that the teaching and studying of SQL are isolated from the actual usage patterns in the real world. The authors have personally heard many new grads complaining that although they learned SQL in school, they still don't know how to start writing SQL statements when they are assigned to their first projects at work. The traditional way of teaching is very well structured and detail-oriented but focuses on functionalities instead of the purpose and usage scenarios, creating a disconnect for beginners.
This disconnect is exactly why this book is here. From the very first edition, this book has focused on one goal: to help beginners understand not only what is in SQL, but also when and how to use it. With this in mind, the contents of this book have been arranged based on the natural flow of everyday data usage, instead of following the semantic definitions of relational databases. The book also utilizes a combination of discussions, hands-on exercises, and exploratory activities, with GitHub-based sample code, to get you more involved in the learning process. Based on the overwhelmingly (thankfully!) positive feedback we have received, this goal has been well achieved.
As positive as the response has been, ever since the third edition of this book, we have also noticed opportunities to improve. During the 2020 pandemic, the authors were engaged in several semesters of remote teaching of SQL, and had the opportunity to test different arrangements of SQL content and learning activities. It was observed that while students benefit from a mixture of lectures and hands-on exercises, the most welcomed approach is to have hands-on exercises immediately following lectures, and then have a related activity shortly after the exercises. As such, in this fourth edition, we have adjusted the topics to follow a learning path that is more natural for beginners, which was derived from our experience in remote teaching. We have added hands-on exercises right after each topic. Finally, as the chapters wrap up, we offer activities without step-by-step instructions for you to practice the skills you just obtained. We hope this new format makes the learning process easier.
SQL is already 55 years old, even older than the authors! Yet it is still evolving, and still dominating the data management practices. The authors, having been in the data management field for so long, are also actively learning and adapting. This book is also a part of this learning/adapting process. We would like to invite you to join us in this journey, and welcome your feedback so that this book can evolve too. Bon voyage!
Who this book is for
This book is primarily for beginners who would like to teach themselves how to use SQL for data analytics, but could also be useful for analytics professionals who would like to learn specific features and functionalities of SQL.
What this book covers
Chapter 1, Introduction to Data Management Systems, introduces how to represent real-world objects using data and provides a foundational understanding of relational databases and SQL. It also guides you through setting up a PostgreSQL database on your machine to store, organize, and analyze data effectively.
Chapter 2, Creating Tables with Solid Structures, introduces the fundamentals of working with relational data using SQL, including creating tables, inserting data, and performing CRUD operations. It also covers defining data types, creating tables from existing datasets, and deleting tables, giving you a solid foundation for managing the full life cycle of relational data.
Chapter 3, Exchanging Data Using COPY, introduces how to export data from and import data into a PostgreSQL database. These skills enable you to transfer data between PostgreSQL and external filesystems for tasks such as backup, migration, and integration.
Chapter 4, Manipulating Data with Python, introduces the basics of getting started with Python and using it to manage data. You will learn how to set up a Python environment, connect to a relational database, and perform data manipulation tasks within the database.
Chapter 5, Presenting Data with SELECT, introduces how to use SQL SELECT expressions to retrieve data from a database. It also covers filtering query results, enabling you to extract specific subsets of data based on defined conditions.
Chapter 6, Transforming and Updating Data, introduces how to modify existing data and table structures in a database to reflect changes in real-world scenarios. It also covers applying data transformation functions and creating user-defined functions for more advanced and customized data processing.
Chapter 7, Defining Datasets from Existing Datasets, introduces techniques for creating derived datasets, joining multiple tables, and performing set operations in SQL. These skills enable you to analyze complex relationships and combine data from various sources to support advanced data exploration and research.
Chapter 8, Aggregating Data with GROUP BY, introduces how to aggregate data in SQL to summarize and extract key insights from raw datasets. It covers the use of GROUP BY and HAVING clauses to organize and filter aggregated results, helping you understand overall patterns and trends in your data.
Chapter 9, Inter-Row Operation with Window Functions, introduces window functions in SQL, which allow you to perform calculations across rows related to the current row without collapsing the result set. It also covers advanced window definitions, enabling you to analyze row positions and relationships within partitions of your dataset.
Chapter 10, Performant SQL, introduces how PostgreSQL scans databases and indexes to retrieve data efficiently. Understanding these mechanisms will help you write optimized queries that improve database performance.
Chapter 11, Processing JSON and Arrays, introduces JSON and array data types in PostgreSQL. You will learn how to parse and work with these complex types, expanding your ability to manage data beyond traditional relational formats.
Chapter 12, Advanced Data Types: Date, Text, and Geospatial, introduces how to work with specialized data types such as date and time, text, and geospatial data in PostgreSQL. By mastering their unique processing rules, you'll enhance your ability to analyze complex real-world scenarios within a relational database.
Chapter 13, Inferential Statistics Using SQL, introduces key inferential statistical concepts, including hypothesis testing, confidence intervals, and regression analysis. It covers how to use SQL for data processing alongside statistical knowledge to gain insights and support data-informed decisions.
Chapter 14, A Case Study for Analytics Using SQL, introduces the fundamentals of data analytical systems and how to apply SQL for data analysis. You will learn how to connect your SQL skills with a real-world case study and gain an overview of the modern data management workflow.
To get the most out of this book
To get the most out of this book, do the following:
- Combine the concepts you learn in this book with real-world examples that you run into at school or work
- Access the SQL code in the GitHub repository of this book
Download the example code files
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/SQL-for-Data-Analytics-Fourth-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781836646259.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: "Here is an example of using the \COPY command to copy rows into a table from a file on the local machine."
A block of code is set as follows:
SELECT product_id, model, base_msrp FROM products; Any command-line input or output is written as follows:
product_name | discount | discounted_price -----------------------+----------+------------------ Lemon | 0.9 | 359.991 Lemon Limited Edition | 0.9 | 719.991 Lemon | 0.9 | 449.991 ...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.