Elasticsearch for Hadoop

 
 
Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 27. Oktober 2015
  • |
  • 222 Seiten
 
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
E-Book | PDF mit Adobe DRM | Systemvoraussetzungen
978-1-78528-224-9 (ISBN)
 
Integrate Elasticsearch into Hadoop to effectively visualize and analyze your dataAbout This BookBuild production-ready analytics applications by integrating the Hadoop ecosystem with ElasticsearchLearn complex Elasticsearch queries and develop real-time monitoring Kibana dashboards to visualize your dataUse Elasticsearch and Kibana to search data in Hadoop easily with this comprehensive, step-by-step guideWho This Book Is ForThis book is targeted at Java developers with basic knowledge on Hadoop. No prior Elasticsearch experience is expected.What You Will LearnSet up the Elasticsearch-Hadoop environmentImport HDFS data into Elasticsearch with MapReduce jobsPerform full-text search and aggregations efficiently using ElasticsearchVisualize data and create interactive dashboards using KibanaCheck and detect anomalies in streaming data using Storm and ElasticsearchInject and classify real-time streaming data into ElasticsearchGet production-ready for Elasticsearch-Hadoop based projectsIntegrate with Hadoop eco-system such as Pig, Storm, Hive, and SparkIn DetailThe Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop serves as a perfect tool to bridge the worlds of Elasticsearch and Hadoop ecosystem to get best out of both the worlds. Powered with Kibana, this stack makes it a cakewalk to get surprising insights out of your massive amount of Hadoop ecosystem in a flash.In this book, you'll learn to use Elasticsearch, Kibana and Elasticsearch-Hadoop effectively to analyze and understand your HDFS and streaming data.You begin with an in-depth understanding of the Hadoop, Elasticsearch, Marvel, and Kibana setup. Right after this, you will learn to successfully import Hadoop data into Elasticsearch by writing MapReduce job in a real-world example. This is then followed by a comprehensive look at Elasticsearch essentials, such as full-text search analysis, queries, filters and aggregations; after which you gain an understanding of creating various visualizations and interactive dashboard using Kibana. Classifying your real-world streaming data and identifying trends in it using Storm and Elasticsearch are some of the other topics that we'll cover. You will also gain an insight about key concepts of Elasticsearch and Elasticsearch-hadoop in distributed mode, advanced configurations along with some common configuration presets you may need for your production deployments. You will have "Go production checklist" and high-level view for cluster administration for post-production. Towards the end, you will learn to integrate Elasticsearch with other Hadoop eco-system tools, such as Pig, Hive and Spark.Style and approachA concise yet comprehensive approach has been adopted with real-time examples to help you grasp the concepts easily.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78528-224-9 (9781785282249)
1785282247 (1785282247)
weitere Ausgaben werden ermittelt
Vishal Shukla is the CEO of Brevitaz Systems (http://brevitaz.com) and a technology evangelist at heart. He is a passionate software scientist and a big data expert. Vishal has extensive experience in designing modular enterprise systems. Since his college days (more than 11 years), Vishal has enjoyed coding in JVM-based languages. He also embraces design thinking and sustainable software development. He has vast experience in architecting enterprise systems in various domains. Vishal is deeply interested in technologies related to big data engineering, analytics, and machine learning.
He set up Brevitaz Systems. This company delivers massively scalable and sustainable big data and analytics-based enterprise applications to their global clientele. With varied expertise in big data technologies and architectural acumen, the Brevitaz team successfully developed and re-engineered a number of legacy systems to state-of-the-art scalable systems. Brevitaz has imbibed in its culture agile practices, such as scrum, test-driven development, continuous integration, and continuous delivery, to deliver high-quality products to its clients.
Vishal is a music and art lover. He loves to sing, play musical instruments, draw portraits, and play sports, such as cricket, table tennis, and pool, in his free time.
You can contact Vishal at vishal.shukla@brevitaz.com and on LinkedIn at https://in.linkedin.com/in/vishalshu. You can also follow Vishal on Twitter at @vishal1shukla2.
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Setting Up Environment
  • Setting up Hadoop for Elasticsearch
  • Setting up Java
  • Setting up a dedicated user
  • Installing SSH and setting up the certificate
  • Downloading Hadoop
  • Setting up environment variables
  • Configuring Hadoop
  • Configuring core-site.xml
  • Configuring hdfs-site.xml
  • Configuring yarn-site.xml
  • Configuring mapred-site.xml
  • The format distributed filesystem
  • Starting Hadoop daemons
  • Setting up Elasticsearch
  • Downloading Elasticsearch
  • Configuring Elasticsearch
  • Installing Elasticsearch's Head plugin
  • Installing the Marvel plugin
  • Running and testing
  • Running the WordCount example
  • Getting the examples and building the job JAR file
  • Importing the test file to HDFS
  • Running our first job
  • Exploring data in Head and Marvel
  • Viewing data in Head
  • Using the Marvel dashboard
  • Exploring the data in Sense
  • Summary
  • Chapter 2: Getting Started with ES-Hadoop
  • Understanding the WordCount program
  • Understanding Mapper
  • Understanding the reducer
  • Understanding the driver
  • Using the old API - org.apache.hadoop.mapred
  • Going real - network monitoring data
  • Getting and understanding the data
  • Knowing the problems
  • Solution approaches
  • Approach 1 - Preaggregate the results
  • Approach 2 - Aggregate the results at query-time
  • Writing the NetworkLogsMapper job
  • Writing the mapper class
  • Writing Driver
  • Building the job
  • Getting the data into HDFS
  • Running the job
  • Viewing the Top N results
  • Getting data from Elasticsearch to HDFS
  • Understanding the Twitter dataset
  • Trying it yourself
  • Creating the MapReduce job to import data from Elasticsearch to HDFS
  • Writing the Tweets2Hdfs mapper
  • Running the example
  • Testing the job execution output
  • Summary
  • Chapter 3: Understanding Elasticsearch
  • Knowing Search and Elasticsearch
  • The paradigm mismatch
  • Index
  • Type
  • Document
  • Field
  • Talking to Elasticsearch
  • CRUD with Elasticsearch
  • Creating the document request
  • Mappings
  • Data types
  • Create mapping API
  • Index templates
  • Controlling the indexing process
  • What is an inverted index?
  • The input data analysis
  • Removing stop words
  • Case insensitive
  • Stemming
  • Synonyms
  • Analyzers
  • Elastic searching
  • Writing search queries
  • The URI search
  • Matching all queries
  • The term query
  • The boolean query
  • The match query
  • The range query
  • The wildcard query
  • Filters
  • Aggregations
  • Executing the aggregation queries
  • The terms aggregation
  • Histograms
  • The range aggregation
  • The geo distance
  • Sub-aggregations
  • Try it yourself
  • Summary
  • Chapter 4: Visualizing Big Data Using Kibana
  • Setting up and getting started
  • Setting up Kibana
  • Setting up datasets
  • Try it out
  • Getting started with Kibana
  • Discovering data
  • Visualizing the data
  • The pie chart
  • The stacked bar chart
  • The date histogram with the stacked bar chart
  • The area chart
  • The split pie chart
  • The sun burst chart
  • The geographical chart
  • Trying it out
  • Creating dynamic dashboards
  • Summary
  • Chapter 5: Real-Time Analytics
  • Getting started with the Twitter Trend Analyser
  • What are we trying to do?
  • Setting up Apache Storm
  • Injecting streaming data into Storm
  • Writing a Storm spout
  • Writing Storm bolts
  • Creating a Storm topology
  • Building and running a Storm job
  • Analyzing trends
  • Significant terms aggregation
  • Viewing trends in Kibana
  • Classifying tweets using percolators
  • Percolator
  • Building a percolator query effectively
  • Classifying tweets
  • Summary
  • Chapter 6: ES-Hadoop in Production
  • Elasticsearch in a distributed environment
  • Elasticsearch clusters and nodes
  • Node types
  • Node discovery
  • Data inside clusters
  • Shards
  • Replicas
  • Shard allocation
  • The ES-Hadoop architecture
  • Dynamic parallelism
  • Writing to Elasticsearch
  • Reads from Elasticsearch
  • Failure handling
  • Data colocation
  • Configuring the environment for production
  • Hardware
  • Memory
  • CPU
  • Disks
  • Network
  • Setting up the cluster
  • The recommended cluster topology
  • Set names
  • Paths
  • Memory configurations
  • The split-brain problem
  • Recovery configurations
  • Configuration presets
  • Rapid indexing
  • Lightening a full text search
  • Faster aggregations
  • Bonus - the production deployment checklist
  • Administration of clusters
  • Monitoring the cluster health
  • Snapshot and restore
  • Backing up your data
  • Restoring your data
  • Summary
  • Chapter 7: Integrating with the Hadoop Ecosystem
  • Pigging out Elasticsearch
  • Setting up Apache Pig for Elasticsearch
  • Importing data to Elasticsearch
  • Writing from the JSON source
  • Type conversions
  • Reading data from Elasticsearch
  • SQLizing Elasticsearch with Hive
  • Setting up Apache Hive
  • Importing data to Elasticsearch
  • Writing from the JSON source
  • Type conversions
  • Reading data from Elasticsearch
  • Cascading with Elasticsearch
  • Importing data to Elasticsearch
  • Writing a cascading job
  • Running the job
  • Reading data from Elasticsearch
  • Writing a reader job
  • Using Lingual with Elasticsearch
  • Giving Spark to Elasticsearch
  • Setting up Spark
  • Importing data to Elasticsearch
  • Using SparkSQL
  • Reading data from Elasticsearch
  • Using SparkSQL
  • ES-Hadoop on YARN
  • Summary
  • Appendix: Configurations
  • Basic configurations
  • es.resource
  • es.resource.read
  • es.resource.write
  • es.nodes
  • es.port
  • Write and query configurations
  • es.query
  • es.input.json
  • es.write.operation
  • es.update.script
  • es.update.script.lang
  • es.update.script.params
  • es.update.script.params.json
  • es.batch.size.bytes
  • es.batch.size.entries
  • es.batch.write.refresh
  • es.batch.write.retry.count
  • es.batch.write.retry.wait
  • es.ser.reader.value.class
  • es.ser.writer.value.class
  • es.update.retry.on.conflict
  • Mapping configurations
  • es.mapping.id
  • es.mapping.parent
  • es.mapping.version
  • es.mapping.version.type
  • es.mapping.routing
  • es.mapping.ttl
  • es.mapping.timestamp
  • es.mapping.date.rich
  • es.mapping.include
  • es.mapping.exclude
  • Index configurations
  • es.index.auto.create
  • es.index.read.missing.as.empty
  • es.field.read.empty.as.null
  • es.field.read.validate.presence
  • Network configurations
  • es.nodes.discovery
  • es.nodes.client.only
  • es.http.timeout
  • es.http.retries
  • es.scroll.keepalive
  • es.scroll.size
  • es.action.heart.beat.lead
  • Authentication configurations
  • es.net.http.auth.user
  • es.net.http.auth.pass
  • SSL configurations
  • es.net.ssl
  • es.net.ssl.keystore.location
  • es.net.ssl.keystore.pass
  • es.net.ssl.keystore.type
  • es.net.ssl.truststore.location
  • es.net.ssl.truststore.pass
  • es.net.ssl.cert.allow.self.signed
  • es.net.ssl.protocol
  • es.scroll.size
  • Proxy configurations
  • es.net.proxy.http.host
  • es.net.proxy.http.port
  • es.net.proxy.http.user
  • es.net.proxy.http.pass
  • es.net.proxy.http.use.system.props
  • es.net.proxy.socks.host
  • es.net.proxy.socks.port
  • es.net.proxy.socks.user
  • es.net.proxy.socks.pass
  • es.net.proxy.socks.use.system.props
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Dateiformat: PDF
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

32,73 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
PDF mit Adobe DRM
siehe Systemvoraussetzungen
Hinweis: Die Auswahl des von Ihnen gewünschten Dateiformats und des Kopierschutzes erfolgt erst im System des E-Book Anbieters
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok