
Scaling Machine Learning with Spark
Distributed ML with MLlib, TensorFlow, and PyTorch
Adi Polak(Author)
O'Reilly (Publisher)
Published on 21. March 2023
Book
Paperback/Softback
400 pages
978-1-0981-0682-9 (ISBN)
Description
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.
Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.
You will:
Explore machine learning, including distributed computing concepts and terminology
Manage the ML lifecycle with MLflow
Ingest data and perform basic preprocessing with Spark
Explore feature engineering, and use Spark to extract features
Train a model with MLlib and build a pipeline to reproduce it
Build a data system to combine the power of Spark with deep learning
Get a step-by-step example of working with distributed TensorFlow
Use PyTorch to scale machine learning and its internal architecture
Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.
You will:
Explore machine learning, including distributed computing concepts and terminology
Manage the ML lifecycle with MLflow
Ingest data and perform basic preprocessing with Spark
Explore feature engineering, and use Spark to extract features
Train a model with MLlib and build a pipeline to reproduce it
Build a data system to combine the power of Spark with deep learning
Get a step-by-step example of working with distributed TensorFlow
Use PyTorch to scale machine learning and its internal architecture
More details
Language
English
Place of publication
Sebastopol
United States
Product notice
Paperback (trade)
Unsewn / adhesive bound
Dimensions
Height: 231 mm
Width: 177 mm
Thickness: 18 mm
Weight
524 gr
ISBN-13
978-1-0981-0682-9 (9781098106829)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Classification
Other editions
Additional editions


Person
As Vice President of Developer Experience at Treeverse, Adi Polak shapes the future of data & ML technologies for hands-on builders. She also contributes to the lakeFS open-source, a git-like interface for object stores. In her work, Adi brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, and business goals. Adi is a frequent worldwide presenter and the author of O'Reilly's upcoming book, "Machine Learning With Apache Spark." She is continually an invited member of multiple program committees and advisor for conferences like Data & AI Summit, Scale by the Bay, and others. Previously, Adi was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures. When Adi isn't building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.