The Definitive Guide to Apache Flink 2016
Next Generation Data Processing
Stefan Papp(Author)
Apress
Published on 8. June 2016
Book
Paperback/Softback
400 pages
978-1-4842-1408-4 (ISBN)
Description
Data Processing is one of the core functionalities of distributed and cloud computing. There is a high demand on low latency and high performance computing as well as the support of abstract processing methods such as SQL querying, analytic frameworks or graph processing by data processing engines. The Definitive Guide to Apache Flink by Papp starts with the history of Big Data processing with Hadoop and explains the shortcomings of Map Reduce. It shows how YARN and Hadoop 2.x changed the game and how new technologies started to compete to become the successor of Map Reduce. After some detailed information on Tez and Spark and how they try to solve shortcomings of Map Reduce, this book deals with some architectural patterns for creating a solid data processing engine, such as advanced pipelining methods or in-memory caching. It shows how Flink is using these concepts. Flink programming will be introduced in a hands-on approach. It starts with how to create a ten minutes build and how to run the first "Word Count" with Flink. Then it continues with more advanced topics such as programming more complex programs. All samples are programmed with Java or Scala.
It shows that Apache Flink has the potential to become one of the key technologies for distributed computing. It aims to replace many small technologies with a more powerful one that covers many aspects of Hadoop programming.
It shows that Apache Flink has the potential to become one of the key technologies for distributed computing. It aims to replace many small technologies with a more powerful one that covers many aspects of Hadoop programming.
More details
Edition
1st ed. 2016
Language
English
Place of publication
Berlin
Germany
Publishing group
Springer-Verlag Berlin and Heidelberg GmbH & Co. KG
Target group
Professional and scholarly
Popular/general
Illustrations
biography
Dimensions
Height: 254 mm
Width: 178 mm
ISBN-13
978-1-4842-1408-4 (9781484214084)
Copyright in bibliographic data is held by Nielsen Book Services Limited or its licensors: all rights reserved.
Schweitzer Classification
Person
Stefan Papp is an IT professional with 20 years experience who has dedicated his professional career to Big Data and Data Science. He focuses Hadoop technologies and consults major companies.
Content
Table of Contents Chapter 1: Data Processing Chapter Goal: Reader gets an overview on Data Processing in distributed environments Sub -Topics * History of Data Processing* Shortcomings of MapReduce* IO-Problems* Why YARN changed the game Chapter 2: Next Generation Data Processing Platform Chapter Goal: Introduce the data processing platforms Sub - Topics * Tez* Spark* In-Memory processing* Pipelines Chapter 3: Ten Minutes Build Chapter Goal: The reader can install Flink, creates a simple build and in general able to set up a Flink project. Sub - Topics * Basic Setup* How to get started in a local environment* How to get started in a Hadoop Environment* Word Count Chapter 4: Programming Essentials Chapter Goal: The reader can write basic Flink Applications and understands how to set them up and has a good understanding on the data typens Sub - Topics: * Your first Flink Application with Java* Your first Flink Application with Scala* Six steps to create a Flink programming* Understanding ExecutionEnvironment* Understanding DataSets and Tuples Chapter 5: Transformation Chapter Goal: List all types of transformations. The reader gets a comprehensive how to transform data with flink. Sub -Topics: * Filtering, Joining* Aggregation Chapter 6: Data Preparation with Flink Chapter Goal: Reader learns how to prepare data for later analysis Sub -Topics * ETL with Flink - Overview* How to access HCatalog* How to ingest several data types with Flink into Hadoop (JSON, csv, XML) Chapter 7: Data Analytics Basis Chapter Goal: How to analyze data with Flink Sub -Topics * K-Means and other statistical methods* Graph Analytics* Aggregation and statistics on weather data * Text Analytics: Sentiment Analysis with Flink Chapter 8: Visualization Chapter Goal: Reader learns how to display analysis results Sub -Topics * Different types of charts* How to visualize results Chapter 9: Streaming Chapter Goal: Reader learns how to stream data Sub -Topics * Streaming how it works* Differences to storm* Performance Chapter 10: Outlook Chapter Goal: The future of Flink Sub -Topics * Overview on the future