Apache Mahout Clustering Designs

Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 8. Oktober 2015
  • |
  • 130 Seiten
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78328-444-3 (ISBN)
Explore clustering algorithms used with Apache MahoutAbout This BookUse Mahout for clustering datasets and gain useful insightsExplore the different clustering algorithms used in day-to-day workA practical guide to create and evaluate your own clustering models using real world data setsWho This Book Is ForThis book is for developers who want to try out clustering on large datasets using Mahout. It will also be useful for those users who don't have background in Mahout, but have knowledge of basic programming and are familiar with basics of machine learning and clustering. It will be helpful if you know about clustering techniques with some other tool.What You Will LearnExplore clustering algorithms and cluster evaluation techniquesLearn different types of clustering and distance measuring techniquesPerform clustering on your data using K-Means clusteringDiscover how canopy clustering is used as pre-process step for K-MeansUse the Fuzzy K-Means algorithm in Apache MahoutImplement Streaming K-Means clustering in MahoutLearn Spectral K-Means clustering implementation of MahoutIn DetailAs more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities has increased. Apache Mahout caters to this need and paves the way for the implementation of complex algorithms in the field of machine learning to better analyse your data and get useful insights into it.Starting with the introduction of clustering algorithms, this book provides an insight into Apache Mahout and different algorithms it uses for clustering data. It provides a general introduction of the algorithms, such as K-Means, Fuzzy K-Means, StreamingKMeans, and how to use Mahout to cluster your data using a particular algorithm. You will study the different types of clustering and learn how to use Apache Mahout with real world data sets to implement and evaluate your clusters.This book will discuss about cluster improvement and visualization using Mahout APIs and also explore model-based clustering and topic modelling using Dirichlet process. Finally, you will learn how to build and deploy a model for production use.Style and approachThis book is a hand's-on guide with examples using real-world datasets. Each chapter begins by explaining the algorithm in detail and follows up with showing how to use mahout for that algorithm using example data-sets.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78328-444-3 (9781783284443)
1783284447 (1783284447)
weitere Ausgaben werden ermittelt
Ashish Gupta has been working in the field of software development for the last 10 years. He has worked in companies such as SAP Labs and Caterpillar as a software developer. While working for a start-up predicting potential customers for new fashion apparels using social media, he developed an interest in the field of machine learning. Since then, he has worked on big data technologies and machine learning for different industries, including retail, finance, insurance, and so on. He is passionate about learning new technologies and sharing that knowledge with others. He is the author of the book, Learning Apache Mahout Classification, Packt Publishing. He has organized many boot camps for Apache Mahout and the Hadoop ecosystem.
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Understanding Clustering
  • The clustering concept
  • Application of clustering
  • Understanding distance measures
  • Understanding different clustering techniques
  • Hierarchical methods
  • The partitioning method
  • The density-based method
  • Probabilistic clustering
  • Algorithm support in Mahout
  • Clustering algorithms in Mahout
  • Installing Mahout
  • Building Mahout code using Maven
  • Setting up the development environment using Eclipse
  • Setting up Mahout for Windows users
  • Preparing data for use with clustering techniques
  • Summary
  • Chapter 2: Understanding K-means Clustering
  • Learning K-means
  • Running K-means on Mahout
  • Dataset selection
  • The clusterdump result
  • Visualizing clusters
  • Summary
  • Chapter 3: Understanding Canopy Clustering
  • Running Canopy clustering on Mahout
  • The Canopy generation phase
  • The Canopy clustering phase
  • Running Canopy clustering
  • Using the Canopy output for K-means
  • Visualizing clusters
  • Working with CSV files
  • Summary
  • Chapter 4: Understanding the Fuzzy K-means Algorithm Using Mahout
  • Learning Fuzzy K-means clustering
  • Running Fuzzy K-means on Mahout
  • Dataset
  • Creating a vector for the dataset
  • Vector reader
  • Visualizing clusters
  • Summary
  • Chapter 5: Understanding Model-based Clustering
  • Learning model-based clustering
  • Understanding Dirichlet clustering
  • Topic modeling
  • Running LDA using Mahout
  • Dataset selection
  • Steps to execute CVB (LDA)
  • Summary
  • Chapter 6: Understanding Streaming K-means
  • Learning Streaming K-means
  • The Streaming step
  • The BallKMeans step
  • Using Mahout for streaming K-means
  • Dataset selection
  • Converting CSV to a vector file
  • Running Streaming K-means
  • Summary
  • Chapter 7: Spectral Clustering
  • Understanding spectral clustering
  • Affinity (similarity) graph
  • Getting graph Laplacian from the affinity matrix
  • Eigenvectors and eigenvalues
  • The spectral clustering algorithm
  • Normalized spectral clustering
  • Mahout implementation of spectral clustering
  • Summary
  • Chapter 8: Improving Cluster Quality
  • Evaluating clusters
  • Extrinsic methods
  • Intrinsic methods
  • Using DistanceMeasure interface
  • Summary
  • Chapter 9: Creating a Cluster Model for Production
  • Preparing the dataset
  • Launching the Mahout job on the cluster
  • Performance tuning for the job
  • Summary
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)


Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.

Download (sofort verfügbar)

28,05 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok