
Pro Microsoft HDInsight
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Pro Microsoft HDInsight is a complete guide to deploying and using Apache Hadoop on the Microsoft Windows Azure Platforms. The information in this book enables you to process enormous volumes of structured as well as non-structured data easily using HDInsight, which is Microsoft's own distribution of Apache Hadoop. Furthermore, the blend of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings available through Windows Azure lets you take advantage of Hadoop's processing power without the worry of creating, configuring, maintaining, or managing your own cluster.
With the data explosion that is soon to happen, the open source Apache Hadoop Framework is gaining traction, and it benefits from a huge ecosystem that has risen around the core functionalities of the Hadoop distributed file system (HDFST) and Hadoop Map Reduce. Pro Microsoft HDInsight equips you with the knowledge, confidence, and technique to configure and manage this ecosystem on Windows Azure. The book is an excellent choice for anyone aspiring to be a data scientist or data engineer, putting you a step ahead in the data mining field.
- Guides you through installation and configuration of an HDInsight cluster on Windows Azure
- Provides clear examples of configuring and executing Map Reduce jobs
- Helps you consume data and diagnose errors from the Windows Azure HDInsight Service
More details
Other editions
Additional editions

Person
Content
- Intro
- Contents at a Glance
- Contents
- About the Author
- About the Technical Reviewers
- Acknowledgments
- Introduction
- Chapter 1: Introducing HDInsight
- What Is Big Data, and Why Now?
- How Is Big Data Different?
- Is Big Data the Right Solution for You?
- The Apache Hadoop Ecosystem
- Microsoft HDInsight: Hadoop on Windows
- Combining HDInsight with Your Business Processes
- Summary
- Chapter 2: Understanding Windows Azure HDInsight Service
- Microsoft's Cloud-Computing Platform
- Windows Azure HDInsight Service
- HDInsight Versions
- Cluster Version 2.1
- Cluster Version 1.6
- Storage Location Options
- Azure storage accounts
- Accessing containers
- Understanding the Windows Azure Storage Blob
- Uploading Data to Windows Azure Storage Blob
- Windows Azure Flat Network Storage
- Summary
- Chapter 3: Provisioning Your HDInsight Service Cluster
- Creating the Storage Account
- Creating a SQL Azure Database
- Deploying Your HDInsight Cluster
- Customizing Your Cluster Creation
- Configuring the Cluster User and Hive/Oozie Storage
- Choosing Your Storage Account
- Finishing the Cluster Creation
- Monitoring the Cluster
- Configuring the Cluster
- Summary
- Chapter 4: Automating HDInsight Cluster Provisioning
- Using the Hadoop .NET SDK
- Adding the NuGet Packages
- Connecting to Your Subscription
- Coding the Application
- Using the PowerShell cmdlets for HDInsight
- Command-Line Interface (CLI)
- Summary
- Chapter 5: Submitting Jobs to Your HDInsight Cluster
- Using the Hadoop .NET SDK
- Adding the References
- Submitting a Custom MapReduce Job
- Adding the MapReduce Classes
- Running the MapReduce Job
- Submitting the wordcount MapReduce Job
- Submitting a Hive Job
- Adding the References
- Creating the Hive Queries
- Running the Hive Job
- Monitoring Job Status
- Using PowerShell
- Writing Script
- Executing The Job
- Using MRRunner
- Summary
- Chapter 6: Exploring the HDInsight Name Node
- Accessing the HDInsight Name Node
- Hadoop Command Line
- The Hive Console
- The Sqoop Console
- The Pig Console
- Hadoop Web Interfaces
- Hadoop MapReduce Status
- The Name Node Status Portal
- The TaskTracker Portal
- HDInsight Windows Services
- Installation Directory
- Summary
- Chapter 7: Using Windows Azure HDInsight Emulator
- Installing the Emulator
- Verifying the Installation
- Using the Emulator
- Future Directions
- Summary
- Chapter 8: Accessing HDInsight over Hive and ODBC
- Hive: The Hadoop Data Warehouse
- Working with Hive
- Creating Hive Tables
- Loading Data
- Querying Tables with HiveQL
- Hive Storage
- The Hive ODBC Driver
- Installing the Driver
- Testing the Driver
- Connecting to the HDInsight Emulator
- Configuring a DSN-less Connection
- Summary
- Chapter 9: Consuming HDInsight from Self-Service BI Tools
- PowerPivot Enhancements
- Creating a Stock Report
- Power View for Excel
- Power BI: The Future
- Summary
- Chapter 10: Integrating HDInsight with SQL Server Integration Services
- SSIS as an ETL Tool
- Creating the Project
- Creating the Data Flow
- Creating the Source Hive Connection
- Creating the Destination SQL Connection
- Creating the Hive Source Component
- Creating the SQL Destination Component
- Mapping the Columns
- Running the Package
- Summary
- Chapter 11: Logging in HDInsight
- Service Logs
- Service Trace Logs
- Service Wrapper Files
- Service Error Files
- Hadoop log4j Log Files
- Log4j Framework
- Windows ODBC Tracing
- Logging Windows Azure Storage Blob Operations
- Logging in Windows Azure HDInsight Emulator
- Summary
- Chapter 12: Troubleshooting Cluster Deployments
- Cluster Creation
- Installer Logs
- Troubleshooting Visual Studio Deployments
- Using Breakpoints
- Using IntelliTrace
- Troubleshooting PowerShell Deployments
- Using the Write-* cmdlets
- Using the -debug Switch
- Summary
- Chapter 13: Troubleshooting Job Failures
- MapReduce Jobs
- Configuration Files
- core-site.xml
- mapred-site.xml
- Log Files
- Compress Job Output
- Concatenate Input Files
- Avoid Spilling
- Hive Jobs
- Log Files
- Compress Intermediate Files
- Configure the Reducer Task Size
- Implement Map Joins
- Pig Jobs
- Configuration File
- Log Files
- Explain Command
- Illustrate Command
- Sqoop Jobs
- Windows Azure Storage Blob
- WASB Authentication
- Azure Throttling
- Connectivity Failures
- Summary
- Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.