Table of ContentsChapter 1: Introduction to next generation distributed systemsChapter Goal:Talks about different kind of distributed systems. Also how the distributed systems evolved over the years from terradata to Hadoop spark etc. How Apache Spark is different than Hadoop.Chapter 2: Introduction to Apache SparkThe architecture and RDD abstraction of Spark. It talks about how Spark distributes the jobs on cluster systems like mesos and yarn.Chapter 3: Getting started RDD APIThis chapter discusses about how to get started with RDD scala API. The chapter starts with a practical example, retail analytics, as a project. Comes with runnable code.Chapter 4: Map/Reduce RDD APIThis chapter discusses about Map/Reduce API of Spark. It talks about shuffling, folding, join and group operation. Comes with runnable code.Chapter 5: Advanced RDD APIThis chapter talks about advanced api like aggregate, mapParitions to control the processing of spark. Comes with runnable code.Chapter 6: Spark cachingIn memory processing is one of the most important part of the Apache Spark. This chapter discusses about how spark implements cache and how to use caching to speed up execution of your spark examples. Comes with runnable code.Chapter 7: Integrating with HadoopSpark integrated beautifully with Hadoop. This chapter discusses about how spark integrates with HDFS and YARN. Comes with runnable code.Chapter 8: Introduction to Spark StreamingSpark streaming is a real time system build on top of Spark. It allows developer to use same Spark API to real time systems.Chapter 9: Anatomy of RDDThis chapter takes a deeper dive into how different RDD is build. The deeper understanding of RDD is very much necessary in order to exploit the spark abstraction to fullest.Chapter 10: SparkQL, Sql on SparkThis chapter talks about using sql query language in Spark to process structured data. Comes with examples.Chapter 11: Graphax, Graph processing in SparkGraph processing is one of the important part of any distributed system. This chapter talks about how graph processing is achieved using Graphax, the graph processing library on Apache Spark.Chapter 12: MLLib, Machine learning in SparkWith advancement of AI, the machine learning is becoming more and more important. This chapter discussed how to use MLLib, machine learning library to do recommendation, prediction in spark.Chapter 13: How all comes togetherOne of the strength of the spark is how different parts of ecosystem comes together to solve problems. This chapter shows how you can mix scala, sql and machine learning in one program to solve a complex problem.