
Real-Time Big Data Analytics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- [*]Implement strategies to solve the challenges of real-time data processing
- [*]Load datasets, build queries, and make recommendations using Spark SQL
Book DescriptionEnterprise has been striving hard to deal with the challenges of data arriving in real time or near real time. Although there are technologies such as Storm and Spark (and many more) that solve the challenges of real-time data, using the appropriate technology/framework for the right business use case is the key to success. This book provides you with the skills required to quickly design, implement and deploy your real-time analytics using real-world examples of big data use cases. From the beginning of the book, we will cover the basics of varied real-time data processing frameworks and technologies. We will discuss and explain the differences between batch and real-time processing in detail, and will also explore the techniques and programming concepts using Apache Storm. Moving on, we'll familiarize you with "Amazon Kinesis" for real-time data processing on cloud. We will further develop your understanding of real-time analytics through a comprehensive review of Apache Spark along with the high-level architecture and the building blocks of a Spark program. You will learn how to transform your data, get an output from transformations, and persist your results using Spark RDDs, using an interface called Spark SQL to work with Spark. At the end of this book, we will introduce Spark Streaming, the streaming library of Spark, and will walk you through the emerging Lambda Architecture (LA), which provides a hybrid platform for big data processing by combining real-time and precomputed batch data to provide a near real-time view of incoming data. What you will learn - [*]Explore big data technologies and frameworks
- [*]Work through practical challenges and use cases of real-time analytics versus batch analytics
- [*]Develop real-word use cases for processing and analyzing data in real-time using the programming paradigm of Apache Storm
- [*]Handle and process real-time transactional data
- [*]Optimize and tune Apache Storm for varied workloads and production deployments
- [*]Process and stream data with Amazon Kinesis and Elastic MapReduce
- [*]Perform interactive and exploratory data analytics using Spark SQL
- [*]Develop common enterprise architectures/applications for real-time and batch analytics
Who this book is forIf you are a Big Data architect, developer, or a programmer who wants to develop applications/frameworks to implement real-time analytics using open source technologies, then this book is for you.
More details
Other editions
Additional editions

Person
Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers. Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS. Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.
Content
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: Introducing the Big Data Technology Landscape and Analytics Platform
- Big Data - a phenomenon
- The Big Data dimensional paradigm
- The Big Data ecosystem
- The Big Data infrastructure
- Components of the Big Data ecosystem
- The Big Data analytics architecture
- Building business solutions
- Dataset processing
- Solution implementation
- Presentation
- Distributed batch processing
- Batch processing in distributed mode
- Push code to data
- Distributed databases (NoSQL)
- Advantages of NoSQL databases
- Choosing a NoSQL database
- Real-time processing
- The telecoms or cellular arena
- Transportation and logistics
- The connected vehicle
- The financial sector
- Summary
- Chapter 2: Getting Acquainted with Storm
- An overview of Storm
- The journey of Storm
- Storm abstractions
- Streams
- Topology
- Spouts
- Bolts
- Storm architecture and its components
- A Zookeeper cluster
- A Storm cluster
- How and when to use Storm
- Storm internals
- Storm parallelism
- Storm internal message processing
- Summary
- Chapter 3: Processing Data with Storm
- Storm input sources
- Meet Kafka
- Getting to know more about Kafka
- Other sources for input to Storm
- A file as an input source
- A socket as an input source
- Kafka as an input source
- Reliability of data processing
- The concept of anchoring and reliability
- The Storm acking framework
- Storm simple patterns
- Joins
- Batching
- Storm persistence
- Storm's JDBC persistence framework
- Summary
- Chapter 4: Introduction to Trident and Optimizing Storm Performance
- Working with Trident
- Transactions
- Trident topology
- Trident tuples
- Trident spout
- Trident operations
- Merging and joining
- Filter
- Function
- Aggregation
- Grouping
- State maintenance
- Understanding LMAX
- Memory and cache
- Ring buffer - the heart of the disruptor
- Producers
- Consumers
- Storm internode communication
- ZeroMQ
- Storm ZeroMQ configurations
- Netty
- Understanding the Storm UI
- Storm UI landing page
- Topology home page
- Optimizing Storm performance
- Summary
- Chapter 5: Getting Acquainted with Kinesis
- Architectural overview of Kinesis
- Benefits and use cases of Amazon Kinesis
- High-level architecture
- Components of Kinesis
- Creating a Kinesis streaming service
- Access to AWS Kinesis
- Configuring the development environment
- Creating Kinesis streams
- Creating Kinesis stream producers
- Creating Kinesis stream consumers
- Generating and consuming crime alerts
- Summary
- Chapter 6: Getting Acquainted with Spark
- An overview of Spark
- Batch data processing
- Real-time data processing
- Apache Spark - a one-stop solution
- When to use Spark - practical use cases
- The architecture of Spark
- High-level architecture
- Spark extensions/libraries
- Spark packaging structure and core APIs
- The Spark execution model - master-worker view
- Resilient distributed datasets (RDD)
- RDD - by definition
- Fault tolerance
- Storage
- Persistence
- Shuffling
- Writing and executing our first Spark program
- Hardware requirements
- Installation of the basic software
- Spark
- Java
- Scala
- Eclipse
- Configuring the Spark cluster
- Coding a Spark job in Scala
- Coding a Spark job in Java
- Troubleshooting - tips and tricks
- Port numbers used by Spark
- Classpath issues - class not found exception
- Other common exceptions
- Summary
- Chapter 7: Programming with RDDs
- Understanding Spark transformations and actions
- RDD APIs
- RDD transformation operations
- RDD action operations
- Programming Spark transformations and actions
- Handling persistence in Spark
- Summary
- Chapter 8: SQL Query Engine for Spark - Spark SQL
- The architecture of Spark SQL
- The emergence of Spark SQL
- The components of Spark SQL
- The DataFrame API
- The Catalyst optimizer
- SQL and Hive contexts
- Coding our first Spark SQL job
- Coding a Spark SQL job in Scala
- Coding a Spark SQL job in Java
- Converting RDDs to DataFrames
- Automated process
- The manual process
- Working with Parquet
- Persisting Parquet data in HDFS
- Partitioning and schema evolution or merging
- Partitioning
- Schema evolution/merging
- Working with Hive tables
- Performance tuning and best practices
- Partitioning and parallelism
- Serialization
- Caching
- Memory tuning
- Summary
- Chapter 9: Analysis of Streaming Data Using Spark Streaming
- High level architecture
- The components of Spark Streaming
- The packaging structure of Spark Streaming
- Spark Streaming APIs
- Spark Streaming operations
- Coding our first Spark Streaming job
- Creating a stream producer
- Writing our Spark Streaming job in Scala
- Writing our Spark Streaming job in Java
- Executing our Spark Streaming job
- Querying streaming data in real time
- The high level architecture of our job
- Coding the crime producer
- Coding the stream consumer and transformer
- Executing the SQL Streaming Crime Analyzer
- Deployment and monitoring
- Cluster managers for Spark Streaming
- Executing Spark Streaming applications on Yarn
- Executing Spark Streaming applications on Apache Mesos
- Monitoring Spark Streaming applications
- Summary
- Chapter 10: Introducing Lambda Architecture
- What is Lambda Architecture
- The need for Lambda Architecture
- Layers/components of Lambda Architecture
- The technology matrix for Lambda Architecture
- Realization of Lambda Architecture
- High level architecture
- Configuring Apache Cassandra and Spark
- Coding the custom producer
- Coding the real-time layers
- Coding the batch layers
- Coding the serving layers
- Executing all the layers
- Summary
- Index
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.