Apache Spark for Machine Learning

Name: Apache Spark for Machine Learning | Build and deploy high-performance big data AI solutions for large-scale clusters
Brand: Packt Publishing Limited
Availability: OnlineOnly

Build and deploy high-performance big data AI solutions for large-scale clusters

Deepak Gowda(Author)

Packt Publishing Limited

1st Edition

Published on 13. January 2025

306 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-83546-001-6 (ISBN)

from €23.99

Available for download

Watchlist: see prices

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Develop your data science skills with Apache Spark to solve real-world problems for Fortune 500 companies using scalable algorithms on large cloud computing clustersKey Features - Apply techniques to analyze big data and uncover valuable insights for machine learning
- Learn to use cloud computing clusters for training machine learning models on large datasets
- Discover practical strategies to overcome challenges in model training, deployment, and optimization
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionIn the world of big data, efficiently processing and analyzing massive datasets for machine learning can be a daunting task. Written by Deepak Gowda, a data scientist with over a decade of experience and 30+ patents, this book provides a hands-on guide to mastering Spark's capabilities for efficient data processing, model building, and optimization. With Deepak's expertise across industries such as supply chain, cybersecurity, and data center infrastructure, he makes complex concepts easy to follow through detailed recipes. This book takes you through core machine learning concepts, highlighting the advantages of Spark for big data analytics. It covers practical data preprocessing techniques, including feature extraction and transformation, supervised learning methods with detailed chapters on regression and classification, and unsupervised learning through clustering and recommendation systems. You'll also learn to identify frequent patterns in data and discover effective strategies to deploy and optimize your machine learning models. Each chapter features practical coding examples and real-world applications to equip you with the knowledge and skills needed to tackle complex machine learning tasks. By the end of this book, you'll be ready to handle big data and create advanced machine learning models with Apache Spark.What you will learn - Master Apache Spark for efficient, large-scale data processing and analysis
- Understand core machine learning concepts and their applications with Spark
- Implement data preprocessing techniques for feature extraction and transformation
- Explore supervised learning methods - regression and classification algorithms
- Apply unsupervised learning for clustering tasks and recommendation systems
- Discover frequent pattern mining techniques to uncover data trends
Who this book is forThis book is ideal for data scientists, ML engineers, data engineers, students, and researchers who want to deepen their knowledge of Apache Spark's tools and algorithms. It's a must-have for those struggling to scale models for real-world problems and a valuable resource for preparing for interviews at Fortune 500 companies, focusing on large dataset analysis, model training, and deployment.

All prices

More details

Other editions

Person

Content

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Introduction and Fundamentals
Chapter 1: An Overview of Machine Learning Concepts
Technical requirements
Understanding machine learning
Types of machine learning
An introduction to Apache Spark
The background and motivation of Apache Spark
Challenges with MapReduce
Components of Apache Spark
Use cases and applications of Apache Spark
Why Apache Spark for machine learning?
Algorithms in Apache Spark
Apache Spark use cases
Setting up Apache Spark
Summary
Chapter 2: Data Processing with Spark
Technical requirements
Understanding data preprocessing
Ingesting data
Filesystems
Amazon S3
Azure Blob Storage
Relational databases
NoSQL databases
Additional data sources
Cleaning and transforming data
Data cleaning
Data transformation
Aggregating data
Basic aggregations
Grouped aggregations
Windowing in Spark
Why windowing is required and its examples in Spark
How to calculate the lag
Data joining
Types of data joins
Summary
Chapter 3: Feature Extraction and Transformation
Technical requirements
Learning about feature extractors
The key aspects of feature extractors
Algorithms for feature extraction
Spark algorithms for feature extractors
Code examples for feature extractors
Working with feature transformers
The key aspects of feature transformers
Use cases and Spark algorithms for feature transformers
Spark algorithms for feature transformers
Code examples for feature transformers
Exploring feature selectors
The key aspects of feature selectors
Use cases and Spark algorithms for feature selectors
Code examples of feature selectors
Summary
Part 2: Supervised Learning
Chapter 4: Building a Regression System
Technical requirements
Learning about regression
Regression overview
Learning regression algorithms
Linear regression
Generalized linear regression
Decision tree regression
Random forest regression
Gradient-boosted tree regression
Survival regression
Factorization machine regressor
Evaluating the model's performance
Selecting the evaluation metrics
Improving the model's performance
Practical implementation
Defining a pipeline for each regression algorithm
Cross-validation and hyperparameter fine-tuning
Summary
Chapter 5: Building a Classification System
Technical requirements
Learning about classification
Classification overview
When to use the classification technique
Some use cases of classification in machine learning
Drawbacks of classification techniques
Learning about classification algorithms
Logistic regression classification
Decision tree classifier
Random forest classifier
Gradient-boosted tree classifier
Multilayer perceptron classifier
Linear SVM
The One-vs-Rest classifier (also known as One-vs-All)
Naive Bayes
Factorization machines classifier
Evaluating the model's performance
Binary classification
Multiclass classification
Algorithm-specific considerations
Selection tips
Selecting the evaluation metrics
Implementation and validation
Improving the model's performance
Code example
Summary
Part 3: Unsupervised Learning
Chapter 6: Building a Clustering System
Technical requirements
Learning about clustering
Understanding clustering
When to use the clustering technique
Some use cases of clustering in machine learning
Pitfalls of clustering techniques
Learning clustering algorithms
K-means
Latent Dirichlet allocation (LDA)
Bisecting K-means
Gaussian Mixture Model (GMM)
Power Iteration Clustering (PIC)
Evaluating the model performance
Evaluation clustering algorithms
Selecting the evaluation metrics
Improving the model performance
General strategies for all models
Model-specific strategies
Summary
Chapter 7: Building a Recommendation System
Technical requirements
An overview of recommendation systems
Understanding the purpose and importance of recommendation systems
An overview of various recommendation approaches
The need for a recommendation system
Personalization
User engagement
Business growth
Data utilization
Content discovery
Bridging supply and demand
The working mechanism of recommendation systems
Content-based recommendation systems
Collaborative filtering recommendation systems
Item-based collaborative filtering
Alternating Least Squares (ALS) - the collaborative filtering algorithm in Apache Spark
The key problems and challenges in recommendation systems
Cold start
Data sparsity
Improving the quality of recommendations
Evaluating the recommendations
Building a recommendation system using Apache Spark
Summary
Chapter 8: Mining Frequent Patterns
Technical requirements
The basic concepts of frequent patterns and the significance of discovering patterns and rules
Frequent pattern mining applications and case studies
The key challenges in frequent pattern mining
Frequent pattern mining algorithms
FP-Growth
PrefixSpan
Code examples on FPM
Developing a model using scalable frequent pattern mining algorithms
Implementation in Apache Spark
Summary
Part 4: Model Deployment
Chapter 9: Deploying a Model
Technical requirements
Importance of model deployment
Pre-deployment considerations
Exploring ML pipelines
Code example of building an ML pipeline
Model serialization and storage
Model serialization
Model storage
Model deployment strategies
Batch scoring
Configure the scheduler
RESTful API integration
Automating model deployment pipeline
Model monitoring and management
Model performance monitoring
Model updating and maintenance
Scalability and performance optimization
Resource management
Performance tuning
Summary
Index
About Packt
Other Books You May Enjoy

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Apache Spark for Machine Learning

Description

All prices

More details

Other editions

Additional editions

Person

Content

System requirements