Real-Time Big Data Analytics

Name: Real-Time Big Data Analytics | Design, process, and analyze large sets of complex data in real time
Brand: Packt Publishing Limited
Price: 33.99 EUR
Availability: OnlineOnly

Design, process, and analyze large sets of complex data in real time

Shilpi Saxena(Author)

Packt Publishing Limited

1st Edition

Published on 8. July 2025

326 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-78439-740-1 (ISBN)

€33.99incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Design, process, and analyze large sets of complex data in real timeKey Features - [*]Get acquainted with transformations and database-level interactions, and ensure the reliability of messages processed using Storm
- [*]Implement strategies to solve the challenges of real-time data processing
- [*]Load datasets, build queries, and make recommendations using Spark SQL
Book DescriptionEnterprise has been striving hard to deal with the challenges of data arriving in real time or near real time. Although there are technologies such as Storm and Spark (and many more) that solve the challenges of real-time data, using the appropriate technology/framework for the right business use case is the key to success. This book provides you with the skills required to quickly design, implement and deploy your real-time analytics using real-world examples of big data use cases. From the beginning of the book, we will cover the basics of varied real-time data processing frameworks and technologies. We will discuss and explain the differences between batch and real-time processing in detail, and will also explore the techniques and programming concepts using Apache Storm. Moving on, we'll familiarize you with "Amazon Kinesis" for real-time data processing on cloud. We will further develop your understanding of real-time analytics through a comprehensive review of Apache Spark along with the high-level architecture and the building blocks of a Spark program. You will learn how to transform your data, get an output from transformations, and persist your results using Spark RDDs, using an interface called Spark SQL to work with Spark. At the end of this book, we will introduce Spark Streaming, the streaming library of Spark, and will walk you through the emerging Lambda Architecture (LA), which provides a hybrid platform for big data processing by combining real-time and precomputed batch data to provide a near real-time view of incoming data. What you will learn - [*]Explore big data technologies and frameworks
- [*]Work through practical challenges and use cases of real-time analytics versus batch analytics
- [*]Develop real-word use cases for processing and analyzing data in real-time using the programming paradigm of Apache Storm
- [*]Handle and process real-time transactional data
- [*]Optimize and tune Apache Storm for varied workloads and production deployments
- [*]Process and stream data with Amazon Kinesis and Elastic MapReduce
- [*]Perform interactive and exploratory data analytics using Spark SQL
- [*]Develop common enterprise architectures/applications for real-time and batch analytics
Who this book is forIf you are a Big Data architect, developer, or a programmer who wants to develop applications/frameworks to implement real-time analytics using open source technologies, then this book is for you.

More details

Other editions

Person

Content

Cover
Copyright
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Introducing the Big Data Technology Landscape and Analytics Platform
Big Data - a phenomenon
The Big Data dimensional paradigm
The Big Data ecosystem
The Big Data infrastructure
Components of the Big Data ecosystem
The Big Data analytics architecture
Building business solutions
Dataset processing
Solution implementation
Presentation
Distributed batch processing
Batch processing in distributed mode
Push code to data
Distributed databases (NoSQL)
Advantages of NoSQL databases
Choosing a NoSQL database
Real-time processing
The telecoms or cellular arena
Transportation and logistics
The connected vehicle
The financial sector
Summary
Chapter 2: Getting Acquainted with Storm
An overview of Storm
The journey of Storm
Storm abstractions
Streams
Topology
Spouts
Bolts
Storm architecture and its components
A Zookeeper cluster
A Storm cluster
How and when to use Storm
Storm internals
Storm parallelism
Storm internal message processing
Summary
Chapter 3: Processing Data with Storm
Storm input sources
Meet Kafka
Getting to know more about Kafka
Other sources for input to Storm
A file as an input source
A socket as an input source
Kafka as an input source
Reliability of data processing
The concept of anchoring and reliability
The Storm acking framework
Storm simple patterns
Joins
Batching
Storm persistence
Storm's JDBC persistence framework
Summary
Chapter 4: Introduction to Trident and Optimizing Storm Performance
Working with Trident
Transactions
Trident topology
Trident tuples
Trident spout
Trident operations
Merging and joining
Filter
Function
Aggregation
Grouping
State maintenance
Understanding LMAX
Memory and cache
Ring buffer - the heart of the disruptor
Producers
Consumers
Storm internode communication
ZeroMQ
Storm ZeroMQ configurations
Netty
Understanding the Storm UI
Storm UI landing page
Topology home page
Optimizing Storm performance
Summary
Chapter 5: Getting Acquainted with Kinesis
Architectural overview of Kinesis
Benefits and use cases of Amazon Kinesis
High-level architecture
Components of Kinesis
Creating a Kinesis streaming service
Access to AWS Kinesis
Configuring the development environment
Creating Kinesis streams
Creating Kinesis stream producers
Creating Kinesis stream consumers
Generating and consuming crime alerts
Summary
Chapter 6: Getting Acquainted with Spark
An overview of Spark
Batch data processing
Real-time data processing
Apache Spark - a one-stop solution
When to use Spark - practical use cases
The architecture of Spark
High-level architecture
Spark extensions/libraries
Spark packaging structure and core APIs
The Spark execution model - master-worker view
Resilient distributed datasets (RDD)
RDD - by definition
Fault tolerance
Storage
Persistence
Shuffling
Writing and executing our first Spark program
Hardware requirements
Installation of the basic software
Spark
Java
Scala
Eclipse
Configuring the Spark cluster
Coding a Spark job in Scala
Coding a Spark job in Java
Troubleshooting - tips and tricks
Port numbers used by Spark
Classpath issues - class not found exception
Other common exceptions
Summary
Chapter 7: Programming with RDDs
Understanding Spark transformations and actions
RDD APIs
RDD transformation operations
RDD action operations
Programming Spark transformations and actions
Handling persistence in Spark
Summary
Chapter 8: SQL Query Engine for Spark - Spark SQL
The architecture of Spark SQL
The emergence of Spark SQL
The components of Spark SQL
The DataFrame API
The Catalyst optimizer
SQL and Hive contexts
Coding our first Spark SQL job
Coding a Spark SQL job in Scala
Coding a Spark SQL job in Java
Converting RDDs to DataFrames
Automated process
The manual process
Working with Parquet
Persisting Parquet data in HDFS
Partitioning and schema evolution or merging
Partitioning
Schema evolution/merging
Working with Hive tables
Performance tuning and best practices
Partitioning and parallelism
Serialization
Caching
Memory tuning
Summary
Chapter 9: Analysis of Streaming Data Using Spark Streaming
High level architecture
The components of Spark Streaming
The packaging structure of Spark Streaming
Spark Streaming APIs
Spark Streaming operations
Coding our first Spark Streaming job
Creating a stream producer
Writing our Spark Streaming job in Scala
Writing our Spark Streaming job in Java
Executing our Spark Streaming job
Querying streaming data in real time
The high level architecture of our job
Coding the crime producer
Coding the stream consumer and transformer
Executing the SQL Streaming Crime Analyzer
Deployment and monitoring
Cluster managers for Spark Streaming
Executing Spark Streaming applications on Yarn
Executing Spark Streaming applications on Apache Mesos
Monitoring Spark Streaming applications
Summary
Chapter 10: Introducing Lambda Architecture
What is Lambda Architecture
The need for Lambda Architecture
Layers/components of Lambda Architecture
The technology matrix for Lambda Architecture
Realization of Lambda Architecture
High level architecture
Configuring Apache Cassandra and Spark
Coding the custom producer
Coding the real-time layers
Coding the batch layers
Coding the serving layers
Executing all the layers
Summary
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Real-Time Big Data Analytics

Description

More details

Other editions

Additional editions

Person

Content

System requirements