Hadoop Backup and Recovery Solutions

Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 28. Juli 2015
  • |
  • 206 Seiten
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78328-905-9 (ISBN)
Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases.You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78328-905-9 (9781783289059)
1783289058 (1783289058)
weitere Ausgaben werden ermittelt
Gaurav Barot is an experienced software architect and PMP-certified project manager with more than 12 years of experience. He has a unique combination of experience in enterprise resource planning, sales, education, and technology. He has served as an enterprise architect and project leader in projects in various domains, including healthcare, risk, insurance, media, and so on for customers in the UK, USA, Singapore, and India.
Gaurav holds a bachelor's degree in IT engineering from Sardar Patel University, and has completed his post graduation in IT from Deakin University Melbourne. Chintan Mehta is a cofounder of KNOWARTH Technologies (www.knowarth.com) and heads the cloud/RIMS department. He has rich, progressive experience in the AWS cloud, DevOps, RIMS, and server administration on open source technologies.
Chintan's vital roles during his career in infrastructure and operations have included requirement analysis, architecture design, security design, high availability and disaster recovery planning, automated monitoring, automated deployment, build processes to help customers, performance tuning, infrastructure setup and deployment, and application setup and deployment. He has done all these along with setting up various offices in different locations with fantastic sole ownership to achieve operation readiness for the organizations he has been associated with.
He headed and managed cloud service practices with his previous employer, and received multiple awards in recognition of the very valuable contribution he made to the business. He was involved in creating solutions and consulting for building SaaS, IaaS, and PaaS services on the cloud. Chintan also led the ISO 27001:2005 implementation team as a joint management representative, and has reviewed Liferay Portal Performance Best Practices, Packt Publishing. He completed his diploma in computer hardware and network certification from a reputed institute in India. Amij Patel is a cofounder of KNOWARTH Technologies (www.knowarth.com) and leads mobile, UI/UX, and e-commerce vertical. He is an out-of-the-box thinker with a proven track record of designing and delivering the best design solutions for enterprise applications and products.
He has a lot of experience in the Web, portals, e-commerce, rich Internet applications, user interfaces, big data, and open source technologies. His passion is to make applications and products interactive and user friendly using the latest technologies. Amij has a unique ability-he can deliver or execute on any layer and technology from the stack.
Throughout his career, he has been honored with awards for making valuable contributions to businesses and delivering excellence through different roles, such as a practice leader, architect, and team leader. He is a cofounder of various community groups, such as Ahmedabad JS and the Liferay UI developers' group. These are focused on sharing knowledge of UI technologies and upcoming trends with the broader community. Amij is respected as motivational, the one who leads by example, a change agent, and a proponent of empowerment and accountability.
  • Cover
  • Copyright
  • Credits
  • About the Authors
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Knowing Hadoop and Clustering Basics
  • Understanding the need for Hadoop
  • Apache Hive
  • Apache Pig
  • Apache HBase
  • Apache HCatalog
  • Understanding HDFS design
  • Getting familiar with HDFS daemons
  • Scenario 1 - writing data to the HDFS cluster
  • Scenario 2 - reading data from the HDFS cluster
  • Understanding the basics of Hadoop cluster
  • Summary
  • Chapter 2: Understanding Hadoop Backup and Recovery Needs
  • Understanding the backup and recovery philosophies
  • Replication of data using DistCp
  • Updating and overwriting using DistCp
  • The backup philosophy
  • Changes since the last backup
  • The rate of new data arrival
  • The size of the cluster
  • Priority of the datasets
  • Selecting the datasets or parts of datasets
  • The timelines of data backups
  • Reducing the window of possible data loss
  • Backup consistency
  • Avoiding invalid backups
  • The recovery philosophy
  • Knowing the necessity of backing up Hadoop
  • Determining backup areas - what should I back up?
  • Datasets
  • Block size - a large file divided into blocks
  • Replication factor
  • A list of all the blocks of a file
  • A list of DataNodes for each block - sorted by distance
  • The ACK package
  • The checksums
  • The number of under-replicated blocks
  • The secondary NameNode
  • Active and passive nodes in second generation Hadoop
  • Hardware failure
  • Software failure
  • Applications
  • Configurations
  • Is taking backup enough?
  • Understanding the disaster recovery principle
  • Knowing a disaster
  • The need for recovery
  • Understanding recovery areas
  • Summary
  • Chapter 3: Determining Backup Strategies
  • Knowing the areas to be protected
  • Understanding the common failure types
  • Hardware failure
  • Host failure
  • Using commodity hardware
  • Hardware failures may lead to loss of data
  • User application failure
  • Software causing task failure
  • Failure of slow-running tasks
  • Hadoop's handling of failing tasks
  • Task failure due to data
  • Bad data handling - through code
  • Hadoop's skip mode
  • Learning a way to define the backup strategy
  • Why do I need a strategy?
  • What should be considered in a strategy?
  • Filesystem check (fsck)
  • Filesystem balancer
  • Upgrading your Hadoop cluster
  • Designing network layout and rack awareness
  • Most important areas to consider while defining a backup strategy
  • Understanding the need for backing up Hive metadata
  • What is Hive?
  • Hive replication
  • Summary
  • Chapter 4: Backing Up Hadoop
  • Data backup in Hadoop
  • Distributed copy
  • Architectural approach to backup
  • HBase
  • HBase history
  • HBase introduction
  • Understanding the HBase data model
  • Accessing HBase data
  • Approaches to backing up HBase
  • Snapshots
  • Operations involved in snapshots
  • HBase replication
  • Modes of replication
  • Export
  • The copy table
  • HTable API
  • Offline backup
  • Comparing backup options
  • Summary
  • Chapter 5: Determining Recovery Strategy
  • Knowing the key considerations of recovery strategy
  • Disaster failure at data centers
  • How HDFS handles failures at data centers
  • Automatic failover configuration
  • How automatic failover configuration works
  • How to configure automatic failover
  • How HBase handles failures at data centers
  • Restoring a point-in time copy for auditing
  • Restoring a data copy due to user error or accidental deletion
  • Defining recovery strategy
  • Centralized configuration
  • Monitoring
  • Alerting
  • Teeing versus copying
  • Summary
  • Chapter 6: Recovering Hadoop Data
  • Failover to backup cluster
  • Installation and configuration
  • The user and group settings
  • Java installation
  • Password-less SSH configuration
  • ZooKeeper installation
  • Hadoop installation
  • The test installation of Hadoop
  • Hadoop configuration for an automatic failover
  • Preparing for the HA state in ZooKeeper
  • Formatting and starting NameNodes
  • Starting the ZKFC services
  • Starting DataNodes
  • Verifying an automatic failover
  • Importing a table or restoring a snapshot
  • Pointing the HBase root folder to the backup location
  • Locating and repairing corruptions
  • Recovering a drive from the working state
  • Lost files
  • The recovery of NameNode
  • What did we do just now?
  • Summary
  • Chapter 7: Monitoring
  • Monitoring overview
  • Metrics of Hadoop
  • FileContext
  • GangliaContext
  • NullContextWithUpdateThread
  • CompositeContext
  • Java Management Extension
  • Monitoring node health
  • Hadoop host monitoring
  • Hadoop process monitoring
  • The HDFS checks
  • The MapReduce checks
  • Cluster monitoring
  • Managing the HDFS cluster
  • Logging
  • Log output written via log4j
  • Setting the log levels
  • Getting stack traces
  • Summary
  • Chapter 8: Troubleshooting
  • Understanding troubleshooting approaches
  • Understanding common failure points
  • Human errors
  • Configuration issues
  • Hardware failures
  • Resource allocation issues
  • Identifying the root cause
  • Knowing issue resolution techniques
  • Summary
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)


Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.

Download (sofort verfügbar)

24,93 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok