Chapter 1: Introduction to Hortonworks Data Platform (HDP) Chapter Goal: This chapter will set the stage for the rest of the book. It will discuss Hadoop and Big Data at a high level for those not familiar with this concepts. It will be the only general knowledge chapter in the book. This secondary purpose is to give the big picture of all the parts of the Hortonworks HDP ecosystem and put those parts in context.A brief history of HadoopBrief overview of the big data landscape and where Hadoop fits in Top level overview of the Hortonworks Data Platform and Enterprise HadoopChapter 2: Understanding HDFSChapter Goal: HDFS is a distributed storage system that form Hadoop. This chapter will define the base principles of Hadoop in HDP and how to work with MapReduce. Understanding HDFS architectureUnderstanding how data is stored in HDFSUnderstanding the relationship between NameNodes and DataNotesWorking with WebHDFS and Hadoop fs commandsChapter 3: Understanding YARNChapter Goal: YARN is the "operating system" of HDP. YARN allows both batch and real time access to data. This chapter will provide a deep understanding of Yarn and how it is employed in HDP.Description of the architecture of Yarn and its relationship to HDFSUnderstanding the components of Yarn (ResourceManager, NodeManager, ApplicationMasters and Containers) as configured in HDPUnderstanding MapReduce and how MapReduce jobs are executed under YARNChapter 4: Getting at Your Data Chapter Goal: HDP has a number of tools to query and explore your data without needing to write complex MapReduce jobs. This chapter will look at the key tools for accessing data in HDP. Scripting data access with PigQuerying data with Hadoop and HCatalogueCreating Hadoop data applications with TezChapter 5: Bringing NoSQL to Hadoop in HDPChapter Goal: This chapter builds on chapter 4 and discusses how some No SQL tools, built on top of YARN in HDP, can provide greater access to data. Understanding and working with HbaseUnderstanding and working with AccumuloChapter 6: Working with HDP in Real TimeChapter Goal: Traditional Hadoop was a batch-based process. YARN introduced the ability to add real time or near real time access to your data. This chapter will look at how developers can use Storm in HDP to process streaming data into their data applications. Working with StormUnderstanding the Trident APICombining Storm with HDFS for dataUse cases for streaming dataChapter 7: Installing and Configuring HDPChapter Goal: The next three chapters will pivot from the developer side of Hadoop to the administration of Hadoop within HDP. This chapter will walk through the process of Installing and configuring Hadoop. Installing Hortonworks HDP Configuring HDP HDP deployments in Windows, Linux, and private cloudsChapter 8: Securing HDPChapter Goal: Security and governance is one of the biggest concerns of all administrators. HDP provides particular security assurances that will help admins sleep better at night. This chapter will show how to secure Hadoop within HDP and how to integrate Hadoop into common directory services.Understanding Hadoop security conceptsSetting up authentication and authorization in HDPAuditing security accessLinking to other directory servicesSecuring a cluster with KnoxChapter 9: Monitoring and Managing Data in HDPChapter Goal: This chapter will explain how to monitor and manage a Hadoop cluster once it has been created in HDP.Monitoring and management approachesScheduling jobs with OozieDeploying and managing Hadoop with AmbariWorking with ZookeeperChapter 10: Getting Your Data into HDPChapter Goal: Once you have configured your Hadoop instance, the next step is to get data into the cluster. This chapter will look at a number of tools for providing ETL (Extract, Transform, Load) process to load data into HDP for Hadoop processing. Executing bulk transfers of data into and out of Hadoop using SqoopManaging data processing and governance with FalconLoading high volume streaming data into HDF using FlumeChapter 11: Understanding HDP Architectural PatternsChapter Goal: This chapter will look at some common architectural patterns for working effectively with HDP. Working with Lambda architectureThinking of data lakesChapter 12: Incorporating HDP into Your Larger Data InfrastructureChapter Goal: This chapter will look at how HDP can be incorporated into a larger data platform. It will place Hadoop with the context of BI solutions, data warehouses, and other MPP appliances (like Terradata and Netezza). Integrating HDP with enterprise data warehouses, RDBMS, and MPP systemsConnecting BI tools to HadoopIntegrating HDP with its ecosystem of analytics partnersChapter 13: Adding Advanced Search in HDP with SolrChapter Goal: This chapter will examine some advanced data access features in HDP, primarily Solr.Leveraging Apache Solr in HDP Full text indexing with Solr Searching Hadoop Data with Apache SolrChapter 14: Bringing HDP into the CloudChapter Goal: This final chapter will look forward to helping build Hadoop solutions in the cloud. This chapter will look both at HDInsight on Microsoft Azure and Hadoop on Amazon's AWS platform.Hadoop on Azure and HDInsightLimitations of Hadoop with HDInsightRunning HDP on AWS Appendix: HDP Add OnsCovers Spark, Advanced Security, ODBC Driver, Teradata Connector, SCOM Management, Oracle Quest Data Connector