Learning Apache Apex

Name: Learning Apache Apex | Real-time streaming applications with Apex
Brand: Packt Publishing
Price: 44.49 EUR
Availability: OnlineOnly

Real-time streaming applications with Apex

Thomas Weise(Author)

Packt Publishing

Published on 8. July 2025

290 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-78829-411-9 (ISBN)

€44.49incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Gundabattula Ananth :
Ananth is a senior application architect in the Decisioning and Advanced Analytics architecture team for Commonwealth Bank of Australia. Ananth holds a Ph.D degree in the domain of computer science security and is interested in all things data including low latency distributed processing systems, machine learning and data engineering domains. He holds 3 patents granted by USPTO and has one application pending. Prior to joining to CBA, he was an architect at Threatmetrix and the member of the core team that scaled Threatmetrix architecture to 100 million transactions per day that runs at very low latencies using Cassandra, Zookeeper and Kafka. He also migrated Threatmetrix data warehouse into the next generation architecture based on Hadoop and Impala. Prior to Threatmetrix, he worked for the IBM software labs and IBM CIO labs enabling some of the first IBM CIO projects onboarding HBase, Hadoop and Mahout stack. Ananth is a committer for Apache Apex and is currently working for the next generation architectures for CBA fraud platform and Advanced Analytics Omnia platform at CBA.Weise Thomas :

Thomas Weise is the Apache Apex PMC Chair and cofounder at Atrato. Earlier, he worked at a number of other technology companies in the San Francisco Bay Area, including DataTorrent, where he was a cofounder of the Apex project. Thomas is also a committer to Apache Beam and has contributed to several more of the ecosystem projects. He has been working on distributed systems for 20 years and has been a speaker at international big data conferences. Thomas received the degree of Diplom-Informatiker (MSc in computer science) from TU Dresden, Germany. He can be reached on Twitter at: @thweise.V. Ramanath Munagala :

Dr. Munagala V. Ramanath got his PhD in Computer Science from the University of Wisconsin, USA and an MSc in Mathematics from Carleton University, Ottawa, Canada. After that, he taught Computer Science courses as Assistant/Associate Professor at the University of Western Ontario in Canada for a few years, before transitioning to the corporate sphere. Since then, he has worked as a senior software engineer at a number of technology companies in California including SeeBeyond, EMC, Sun Microsystems, DataTorrent, and Cloudera. He has published papers in peer reviewed journals in several areas including code optimization, graph theory, and image processing.Yan David :

David Yan is based in the Silicon Valley, California. He is a senior software engineer at Google. Prior to Google, he worked at DataTorrent, Yahoo!, and the Jet Propulsion Laboratory. David holds a master of science in Computer Science from Stanford University and a bachelor of science in Electrical Engineering and Computer Science from the University of California at BerkeleyKnowles Kenneth :

Kenneth Knowles is a founding PMC member of Apache Beam. Kenn has been working on Google Cloud Dataflow-Google's Beam backend-since 2014. Prior to that, he built backends for startups such as Cityspan, Inkling, and Dimagi. Kenn holds a PhD in Programming Language Theory from the University of California, Santa Cruz.

Content

Cover
Title Page
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Introduction to Apex
Unbounded data and continuous processing
Stream processing
Stream processing systems
What is Apex and why is it important?
Use cases and case studies
Real-time insights for Advertising Tech (PubMatic)
Industrial IoT applications (GE)
Real-time threat detection (Capital One)
Silver Spring Networks (SSN)
Application Model and API
Directed Acyclic Graph (DAG)
Apex DAG Java API
High-level Stream Java API
SQL
JSON
Windowing and time
Value proposition of Apex
Low latency and stateful processing
Native streaming versus micro-batch
Performance
Where Apex excels
Where Apex is not suitable
Summary
Chapter 2: Getting Started with Application Development
Development process and methodology
Setting up the development environment
Creating a new Maven project
Application specifications
Custom operator development
The Apex operator model
CheckpointListener/CheckpointNotificationListener
ActivationListener
IdleTimeHandler
Application configuration
Testing in the IDE
Writing the integration test
Running the application on YARN
Execution layer components
Installing Apex Docker sandbox
Running the application
Working on the cluster
YARN web UI
Apex CLI
Logging
Dynamically adjusting logging levels
Summary
Chapter 3: The Apex Library
An overview of the library
Integrations
Apache Kafka
Kafka input
Kafka output
Other streaming integrations
JMS (ActiveMQ, SQS, and so on)
Kinesis streams
Files
File input
File splitter and block reader
File writer
Databases
JDBC input
JDBC output
Other databases
Transformations
Parser
Filter
Enrichment
Map transform
Custom functions
Windowed transformations
Windowing
Global Window
Time Windows
Sliding Time Windows
Session Windows
Window propagation
State
Accumulation
Accumulation Mode
State storage
Watermarks
Allowed lateness
Triggering
Merging of streams
The windowing example
Dedup
Join
State Management
Summary
Chapter 4: Scalability, Low Latency, and Performance
Partitioning and how it works
Elasticity
Partitioning toolkit
Configuring and triggering partitioning
StreamCodec
Unifier
Custom dynamic partitioning
Performance optimizations
Affinity and anti-affinity
Low-latency versus throughput
Sample application for dynamic partitioning
Performance - other aspects for custom operators
Summary
Chapter 5: Fault Tolerance and Reliability
Distributed systems need to be resilient
Fault-tolerance components and mechanism in Apex
Checkpointing
When to checkpoint
How to checkpoint
What to checkpoint
Incremental state saving
Incremental recovery
Processing guarantees
Example - exactly-once counting
The exactly-once output to JDBC
Summary
Chapter 6: Example Project - Real-Time Aggregation and Visualization
Streaming ETL and beyond
The application pattern in a real-world use case
Analyzing Twitter feed
Top Hashtags
TweetStats
Running the application
Configuring Twitter API access
Enabling WebSocket output
The Pub/Sub server
Grafana visualization
Installing Grafana
Installing Grafana Simple JSON Datasource
The Grafana Pub/Sub adapter server
Setting up the dashboard
Summary
Chapter 7: Example Project - Real-Time Ride Service Data Processing
The goal
Datasource
The pipeline
Simulation of a real-time feed using historical data
Parsing the data
Looking up of the zip code and preparing for the windowing operation
Windowed operator configuration
Serving the data with WebSocket
Running the application
Running the application on GCP Dataproc
Summary
Chapter 8: Example Project - ETL Using SQL
The application pipeline
Building and running the application
Application configuration
The application code
Partitioning
Application testing
Understanding application logs
Calcite integration
Summary
Chapter 9: Introduction to Apache Beam
Introduction to Apache Beam
Beam concepts
Pipelines, PTransforms, and PCollections
ParDo - elementwise computation
GroupByKey/CombinePerKey - aggregation across elements
Windowing, watermarks, and triggering in Beam
Windowing in Beam
Watermarks in Beam
Triggering in Beam
Advanced topic - stateful ParDo
WordCount in Apache Beam
Setting up your pipeline
Reading the works of Shakespeare in parallel
Splitting each line on spaces
Eliminating empty strings
Counting the occurrences of each word
Format your results
Writing to a sharded text file in parallel
Testing the pipeline at small scale with DirectRunner
Running Apache Beam WordCount on Apache Apex
Summary
Chapter 10: The Future of Stream Processing
Lower barrier for building streaming pipelines
Visual development tools
Streaming SQL
Better programming API
Bridging the gap between data science and engineering
Machine learning integration
State management
State query and data consistency
Containerized infrastructure
Management tools
Summary
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Learning Apache Apex

Description

More details

Other editions

Additional editions

Person

Content

System requirements