
The Data Science Workshop
Learn how you can build machine learning models and create your own real-world data science projects
Packt Publishing
2nd Edition
Published on 28. August 2020
Book
Paperback/Softback
824 pages
978-1-80056-692-7 (ISBN)
Description
Where there's data, there's insight. With so much data being generated, there is immense scope to extract meaningful information that'll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you'll open new career paths and opportunities.
The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You'll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you'll get hands-on with approaches such as grid search and random search.
Next, you'll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You'll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.
By the end of this book, you'll have the skills to start working on data science projects confidently. By the end of this book, you'll have the skills to start working on data science projects confidently.Key Features
Gain a full understanding of the model production and deployment process
Build your first machine learning model in just five minutes and get a hands-on machine learning experience
Understand how to deal with common challenges in data science projects
What you will learnExplore the key differences between supervised learning and unsupervised learning
Manipulate and analyze data using scikit-learn and pandas libraries
Understand key concepts such as regression, classification, and clustering
Discover advanced techniques to improve the accuracy of your model
Understand how to speed up the process of adding new features
Simplify your machine learning workflow for production
Who this book is forThis is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.
The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You'll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you'll get hands-on with approaches such as grid search and random search.
Next, you'll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You'll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.
By the end of this book, you'll have the skills to start working on data science projects confidently. By the end of this book, you'll have the skills to start working on data science projects confidently.Key Features
Gain a full understanding of the model production and deployment process
Build your first machine learning model in just five minutes and get a hands-on machine learning experience
Understand how to deal with common challenges in data science projects
What you will learnExplore the key differences between supervised learning and unsupervised learning
Manipulate and analyze data using scikit-learn and pandas libraries
Understand key concepts such as regression, classification, and clustering
Discover advanced techniques to improve the accuracy of your model
Understand how to speed up the process of adding new features
Simplify your machine learning workflow for production
Who this book is forThis is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.
More details
Edition
2nd Revised edition
Language
English
Place of publication
Birmingham
United Kingdom
Edition type
Revised edition
Dimensions
Height: 235 mm
Width: 191 mm
Thickness: 44 mm
Weight
1507 gr
ISBN-13
978-1-80056-692-7 (9781800566927)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Classification
Other editions
Additional editions

Anthony So So | Thomas Joseph | Robert Thas John
The Data Science Workshop
Learn how you can build machine learning models and create your own real-world data science projects
E-Book
08/2020
2nd Edition
Packt Publishing
€25.49
Available for download
Persons
Anthony So is a renowned leader in data science. He has extensive experience in solving complex business problems using advanced analytics and AI in different industries including financial services, media, and telecommunications. He is currently the chief data officer of one of the most innovative fintech start-ups. He is also the author of several best-selling books on data science, machine learning, and deep learning. He has won multiple prizes at several hackathon competitions, such as Unearthed, GovHack, and Pepper Money. Anthony holds two master's degrees, one in computer science and the other in data science and innovation. Thomas V. Joseph is a data science practitioner, researcher, trainer, mentor, and writer with more than 19 years of experience. He has extensive experience in solving business problems using machine learning toolsets across multiple industry segments. Robert Thas John is a Google developer expert in machine learning. His day job involves working as a data engineer on the Google Cloud Platform by building, training, and deploying large-scale machine learning models. He also makes decisions about how to store and process large amounts of data. He has more than 10 years of experience in building enterprise-grade solutions and working with data. He spends his free time learning or contributing to the developer community. He frequently travels to speak at technology events or to mentor developers. He also writes a blog on data science. Andrew David Worsley is an independent consultant and educator with expertise in the areas of machine learning, statistics, cloud computing, and artificial intelligence. He has practiced data science in several countries across a multitude of industries including retail, financial services, marketing, resources, and healthcare. Dr. Samuel Asare is a professional engineer with enthusiasm for Python programming, research, and writing. He is highly skilled in applying data science methods to the extraction of useful insights from large data sets. He possesses solid skills in project management processes. Samuel has previously held positions, in industry and academia, as a process engineer and a lecturer of materials science and engineering respectively. Presently, he is pursuing his passion for solving industry problems, using data science methods, and writing.
Content
Table of Contents
Introduction to Data Science in Python
Regression
Binary Classification
Multiclass Classification with RandomForest
Performing Your First Cluster Analysis
How to Assess Performance
The Generalization of Machine Learning Models
Hyperparameter Tuning
Interpreting a Machine Learning Model
Analyzing a Dataset
Data Preparation
Feature Engineering
Imbalanced Datasets
Dimensionality Reduction
Ensemble Learning
Introduction to Data Science in Python
Regression
Binary Classification
Multiclass Classification with RandomForest
Performing Your First Cluster Analysis
How to Assess Performance
The Generalization of Machine Learning Models
Hyperparameter Tuning
Interpreting a Machine Learning Model
Analyzing a Dataset
Data Preparation
Feature Engineering
Imbalanced Datasets
Dimensionality Reduction
Ensemble Learning