
Apache Spark for Machine Learning
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Learn to use cloud computing clusters for training machine learning models on large datasets
- Discover practical strategies to overcome challenges in model training, deployment, and optimization
- Purchase of the print or Kindle book includes a free PDF eBook
Book DescriptionIn the world of big data, efficiently processing and analyzing massive datasets for machine learning can be a daunting task. Written by Deepak Gowda, a data scientist with over a decade of experience and 30+ patents, this book provides a hands-on guide to mastering Spark's capabilities for efficient data processing, model building, and optimization. With Deepak's expertise across industries such as supply chain, cybersecurity, and data center infrastructure, he makes complex concepts easy to follow through detailed recipes. This book takes you through core machine learning concepts, highlighting the advantages of Spark for big data analytics. It covers practical data preprocessing techniques, including feature extraction and transformation, supervised learning methods with detailed chapters on regression and classification, and unsupervised learning through clustering and recommendation systems. You'll also learn to identify frequent patterns in data and discover effective strategies to deploy and optimize your machine learning models. Each chapter features practical coding examples and real-world applications to equip you with the knowledge and skills needed to tackle complex machine learning tasks. By the end of this book, you'll be ready to handle big data and create advanced machine learning models with Apache Spark.What you will learn - Master Apache Spark for efficient, large-scale data processing and analysis
- Understand core machine learning concepts and their applications with Spark
- Implement data preprocessing techniques for feature extraction and transformation
- Explore supervised learning methods - regression and classification algorithms
- Apply unsupervised learning for clustering tasks and recommendation systems
- Discover frequent pattern mining techniques to uncover data trends
Who this book is forThis book is ideal for data scientists, ML engineers, data engineers, students, and researchers who want to deepen their knowledge of Apache Spark's tools and algorithms. It's a must-have for those struggling to scale models for real-world problems and a valuable resource for preparing for interviews at Fortune 500 companies, focusing on large dataset analysis, model training, and deployment.
All prices
More details
Other editions
Additional editions

Person
Deepak Gowda is a data scientist and AI/ML expert with over a decade of experience in leading innovative solutions across various industries, including supply chain, cybersecurity, and data center infrastructure. He holds over 30 granted patents, contributing to advancements in automation, predictive analytics, and AI-driven optimization. His work spans data engineering, machine learning, and distributed systems, focusing on building scalable and impactful products. A passionate inventor, mentor, author, and FAA-certified pilot, Deepak is also dedicated to content creation, sharing his expertise through writing, speaking, and mentoring. He continues to push the boundaries of technology, driving innovation across sectors.
Content
- Cover
- Title Page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- Part 1: Introduction and Fundamentals
- Chapter 1: An Overview of Machine Learning Concepts
- Technical requirements
- Understanding machine learning
- Types of machine learning
- An introduction to Apache Spark
- The background and motivation of Apache Spark
- Challenges with MapReduce
- Components of Apache Spark
- Use cases and applications of Apache Spark
- Why Apache Spark for machine learning?
- Algorithms in Apache Spark
- Apache Spark use cases
- Setting up Apache Spark
- Summary
- Chapter 2: Data Processing with Spark
- Technical requirements
- Understanding data preprocessing
- Ingesting data
- Filesystems
- Amazon S3
- Azure Blob Storage
- Relational databases
- NoSQL databases
- Additional data sources
- Cleaning and transforming data
- Data cleaning
- Data transformation
- Aggregating data
- Basic aggregations
- Grouped aggregations
- Windowing in Spark
- Why windowing is required and its examples in Spark
- How to calculate the lag
- Data joining
- Types of data joins
- Summary
- Chapter 3: Feature Extraction and Transformation
- Technical requirements
- Learning about feature extractors
- The key aspects of feature extractors
- Algorithms for feature extraction
- Spark algorithms for feature extractors
- Code examples for feature extractors
- Working with feature transformers
- The key aspects of feature transformers
- Use cases and Spark algorithms for feature transformers
- Spark algorithms for feature transformers
- Code examples for feature transformers
- Exploring feature selectors
- The key aspects of feature selectors
- Use cases and Spark algorithms for feature selectors
- Code examples of feature selectors
- Summary
- Part 2: Supervised Learning
- Chapter 4: Building a Regression System
- Technical requirements
- Learning about regression
- Regression overview
- Learning regression algorithms
- Linear regression
- Generalized linear regression
- Decision tree regression
- Random forest regression
- Gradient-boosted tree regression
- Survival regression
- Factorization machine regressor
- Evaluating the model's performance
- Selecting the evaluation metrics
- Improving the model's performance
- Practical implementation
- Defining a pipeline for each regression algorithm
- Cross-validation and hyperparameter fine-tuning
- Summary
- Chapter 5: Building a Classification System
- Technical requirements
- Learning about classification
- Classification overview
- When to use the classification technique
- Some use cases of classification in machine learning
- Drawbacks of classification techniques
- Learning about classification algorithms
- Logistic regression classification
- Decision tree classifier
- Random forest classifier
- Gradient-boosted tree classifier
- Multilayer perceptron classifier
- Linear SVM
- The One-vs-Rest classifier (also known as One-vs-All)
- Naive Bayes
- Factorization machines classifier
- Evaluating the model's performance
- Binary classification
- Multiclass classification
- Algorithm-specific considerations
- Selection tips
- Selecting the evaluation metrics
- Implementation and validation
- Improving the model's performance
- Code example
- Summary
- Part 3: Unsupervised Learning
- Chapter 6: Building a Clustering System
- Technical requirements
- Learning about clustering
- Understanding clustering
- When to use the clustering technique
- Some use cases of clustering in machine learning
- Pitfalls of clustering techniques
- Learning clustering algorithms
- K-means
- Latent Dirichlet allocation (LDA)
- Bisecting K-means
- Gaussian Mixture Model (GMM)
- Power Iteration Clustering (PIC)
- Evaluating the model performance
- Evaluation clustering algorithms
- Selecting the evaluation metrics
- Improving the model performance
- General strategies for all models
- Model-specific strategies
- Summary
- Chapter 7: Building a Recommendation System
- Technical requirements
- An overview of recommendation systems
- Understanding the purpose and importance of recommendation systems
- An overview of various recommendation approaches
- The need for a recommendation system
- Personalization
- User engagement
- Business growth
- Data utilization
- Content discovery
- Bridging supply and demand
- The working mechanism of recommendation systems
- Content-based recommendation systems
- Collaborative filtering recommendation systems
- Item-based collaborative filtering
- Alternating Least Squares (ALS) - the collaborative filtering algorithm in Apache Spark
- The key problems and challenges in recommendation systems
- Cold start
- Data sparsity
- Improving the quality of recommendations
- Evaluating the recommendations
- Building a recommendation system using Apache Spark
- Summary
- Chapter 8: Mining Frequent Patterns
- Technical requirements
- The basic concepts of frequent patterns and the significance of discovering patterns and rules
- Frequent pattern mining applications and case studies
- The key challenges in frequent pattern mining
- Frequent pattern mining algorithms
- FP-Growth
- PrefixSpan
- Code examples on FPM
- Developing a model using scalable frequent pattern mining algorithms
- Implementation in Apache Spark
- Summary
- Part 4: Model Deployment
- Chapter 9: Deploying a Model
- Technical requirements
- Importance of model deployment
- Pre-deployment considerations
- Exploring ML pipelines
- Code example of building an ML pipeline
- Model serialization and storage
- Model serialization
- Model storage
- Model deployment strategies
- Batch scoring
- Configure the scheduler
- RESTful API integration
- Automating model deployment pipeline
- Model monitoring and management
- Model performance monitoring
- Model updating and maintenance
- Scalability and performance optimization
- Resource management
- Performance tuning
- Summary
- Index
- About Packt
- Other Books You May Enjoy
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.