Practical Real-time Data Processing and Analytics

Name: Practical Real-time Data Processing and Analytics | Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka
Brand: Packt Publishing
Price: 48.49 EUR
Availability: OnlineOnly

Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka

Shilpi Saxena(Author)

Packt Publishing

Published on 13. January 2025

360 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-78728-986-4 (ISBN)

€48.49incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Intro
Practical Real-Time Data Processing and Analytics
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Why subscribe?
Customer Feedback
Table of Contents
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
1 Introducing Real-Time Analytics
What is big data?
Big data infrastructure
Real-time analytics - the myth and the reality
Near real-time solution - an architecture that works
Lambda architecture - analytics possibilities
IOT - thoughts and possibilities
Cloud - considerationos for NRT and IOT
Summary
2 Real Time Applications - The Basic Ingredients
The NRT system and its building blocks
NRT - high-level system view
NRT - technology view
Summary
3 Understanding and Tailing Data Streams
Understanding data streams
Setting up infrastructure for data ingestion
Taping data from source to the processor - expectations and caveats
Comparing and choosing what works best for your use case
Do it yourself
Summary
4 Setting up the Infrastructure for Storm
Overview of Storm
Storm architecture and its components
Setting up and configuring Storm
Real-time processing job on Storm
Summary
5 Configuring Apache Spark and Flink
Setting up and a quick execution of Spark
Setting up and a quick execution of Flink
Setting up and a quick execution of Apache Beam
Balancing in Apache Beam
Summary
6 Integrating Storm with a Data Source
RabbitMQ - messaging that works
RabbitMQ exchanges
RabbitMQ - integration with Storm
PubNub data stream publisher
String together Storm-RMQ-PubNub sensor data topology
Summary
7 From Storm to Sink
Setting up and configuring Cassandra
Storm and Cassandra topology
Storm and IMDB integration for dimensional data
Integrating the presentation layer with Storm
Do It Yourself
Summary
8 Storm Trident
State retention and the need for Trident
Basic Storm Trident topology
Trident internals
Trident operations
DRPC
Do It Yourself
Summary
9 Working with Spark
Spark overview
Distinct advantages of Spark
Spark - use cases
Spark architecture - working inside the engine
Spark pragmatic concepts
Spark 2.x - advent of data frames and datasets
Summary
10 Working with Spark Operations
Spark - packaging and API
RDD pragmatic exploration
Shared variables - broadcast variables and accumulators
Summary
11 Spark Streaming
Spark Streaming concepts
Spark Streaming - introduction and architecture
Packaging structure of Spark Streaming
Connecting Kafka to Spark Streaming
Summary
12 Working with Apache Flink
Flink architecture and execution engine
Flink basic components and processes
Integration of source stream to Flink
Flink processing and computation
Flink persistence
FlinkCEP
Pattern API
Gelly
DIY
Summary
13 Case Study
Introduction
Data modeling
Tools and frameworks
Setting up the infrastructure
Implementing the case study
Running the case study
Summary
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Practical Real-time Data Processing and Analytics

Description

More details

Other editions

Additional editions

Person

Content

System requirements