Apache Spark Graph Processing

 
 
Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 10. September 2015
  • |
  • 148 Seiten
 
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78439-895-8 (ISBN)
 
Build, process and analyze large-scale graph data effectively with SparkAbout This BookFind solutions for every stage of data processing from loading and transforming graph data toImprove the scalability of your graphs with a variety of real-world applications with complete Scala code.A concise guide to processing large-scale networks with Apache Spark.Who This Book Is ForThis book is for data scientists and big data developers who want to learn the processing and analyzing graph datasets at scale. Basic programming experience with Scala is assumed. Basic knowledge of Spark is assumed.What You Will LearnWrite, build and deploy Spark applications with the Scala Build Tool.Build and analyze large-scale network datasetsAnalyze and transform graphs using RDD and graph-specific operationsImplement new custom graph operations tailored to specific needs.Develop iterative and efficient graph algorithms using message aggregation and Pregel abstractionExtract subgraphs and use it to discover common clustersAnalyze graph data and solve various data science problems using real-world datasets.In DetailApache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework.This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures.This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data.Style and approachA step-by-step guide that will walk you through the key ideas and techniques for processing big graph data at scale, with practical examples that will ensure an overall understanding of the concepts of Spark.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78439-895-8 (9781784398958)
1784398950 (1784398950)
weitere Ausgaben werden ermittelt
Rindra Ramamonjison is a fourth year PhD student of electrical engineering at the University of British Columbia, Vancouver. He received his master's degree from Tokyo Institute of Technology. He has played various roles in many engineering companies, within telecom and finance industries. His primary research interests are machine learning, optimization, graph processing, and statistical signal processing. Rindra is also the co-organizer of the Vancouver Spark Meetup.
  • Cover
  • Copyright
  • Credits
  • Foreword
  • About the Author
  • About the Reviewer
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Getting Started with Spark and GraphX
  • Downloading and installing Spark 1.4.1
  • Experimenting with the Spark shell
  • Getting started with GraphX
  • Building a tiny social network
  • Loading the data
  • The property graph
  • Transforming RDDs to VertexRDD and EdgeRDD
  • Introducing graph operations
  • Building and submitting a standalone application
  • Writing and configuring a Spark program
  • Building the program with the Scala Build Tool
  • Deploying and running with spark-submit
  • Summary
  • Chapter 2: Building and Exploring Graphs
  • Network datasets
  • The communication network
  • Flavor networks
  • Social ego networks
  • Graph builders
  • The Graph factory method
  • edgeListFile
  • fromEdges
  • fromEdgeTuples
  • Building graphs
  • Building directed graphs
  • Building a bipartite graph
  • Building a weighted social ego network
  • Computing the degrees of the network nodes
  • In-degree and out-degree of the Enron email network
  • Degrees in the bipartite food network
  • Degree histogram of the social ego networks
  • Summary
  • Chapter 3: Graph Analysis and Visualization
  • Network datasets
  • The graph visualization
  • Installing the GraphStream and BreezeViz libraries
  • Visualizing the graph data
  • Plotting the degree distribution
  • The analysis of network connectedness
  • Finding the connected components
  • Counting triangles and computing clustering coefficients
  • The network centrality and PageRank
  • How PageRank works
  • Ranking web pages
  • Scala Build Tool revisited
  • Organizing build definitions
  • Managing library dependencies
  • A preview of the steps
  • Running tasks with SBT commands
  • Summary
  • Chapter 4: Transforming and Shaping Up Graphs to Your Needs
  • Transforming the vertex and edge attributes
  • mapVertices
  • mapEdges
  • mapTriplets
  • Modifying graph structures
  • The reverse operator
  • The subgraph operator
  • The mask operator
  • The groupEdges operator
  • Joining graph datasets
  • joinVertices
  • outerJoinVertices
  • Example - Hollywood movie graph
  • Data operations on VertexRDD and EdgeRDD
  • Mapping VertexRDD and EdgeRDD
  • Filtering VertexRDDs
  • Joining VertexRDDs
  • Joining EdgeRDDs
  • Reversing edge directions
  • Collecting neighboring information
  • Example - from food network to flavor pairing
  • Summary
  • Chapter 5: Creating Custom Graph Aggregation Operators
  • NCAA College Basketball datasets
  • The aggregateMessages operator
  • EdgeContext
  • Abstracting out the aggregation
  • Keeping things DRY
  • Coach wants more numbers
  • Calculating average points per game
  • Defense stats - D matters as in direction
  • Joining average stats into a graph
  • Performance optimization
  • The MapReduceTriplets operator
  • Summary
  • Chapter 6: Iterative Graph-Parallel Processing with Pregel
  • The Pregel computational model
  • Example - iterating towards the social equality
  • The Pregel API in GraphX
  • Community detection through label propagation
  • The Pregel implementation of PageRank
  • Summary
  • Chapter 7: Learning Graph Structures
  • Community clustering in graphs
  • Spectral clustering
  • Power iteration clustering
  • Applications - music fan community detection
  • Step 1 - load the data into a Spark graph property
  • Step 2 - extract the features of nodes
  • Step 3 - define a similarity measure between two nodes
  • Step 4 - create an affinity matrix
  • Step 5 - run k-means clustering on the affinity matrix
  • Exercise - collaborative clustering through playlists
  • Summary
  • References
  • Chapter 2, Building and Exploring Graphs
  • Chapter 3, Graph Analysis and Visualization
  • Chapter 7, Learning Graph Structures
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

28,05 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok