The Supervised Learning Workshop

Name: The Supervised Learning Workshop | Predict outcomes from data by building your own powerful predictive models with machine learning in Python
Brand: Packt Publishing
Price: 23.49 EUR
Availability: OnlineOnly

Predict outcomes from data by building your own powerful predictive models with machine learning in Python

Blaine Bateman Ashish Ranjan Jha Benjamin Johnston(Author)

Packt Publishing

2nd Edition

Published on 28. February 2020

532 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-80020-832-2 (ISBN)

€23.49incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Discover how you can supervise machine learning algorithms in Python and personalize predictive models with the help of real-world datasets

Key Features

Explore the fundamentals of supervised machine learning and its applications
Learn how to label and process data correctly using Python libraries
Gain a comprehensive overview of different machine learning algorithms used for building prediction models

Book DescriptionWould you like to understand how and why machine learning techniques and data analytics are spearheading enterprises globally? From analyzing bioinformatics to predicting climate change, machine learning plays an increasingly pivotal role in our society. Although the real-world applications may seem complex, this book simplifies supervised learning for beginners with a step-by-step interactive approach. Working with real-time datasets, you'll learn how supervised learning, when used with Python, can produce efficient predictive models. Starting with the fundamentals of supervised learning, you'll quickly move to understand how to automate manual tasks and the process of assessing date using Jupyter and Python libraries like pandas. Next, you'll use data exploration and visualization techniques to develop powerful supervised learning models, before understanding how to distinguish variables and represent their relationships using scatter plots, heatmaps, and box plots. After using regression and classification models on real-time datasets to predict future outcomes, you'll grasp advanced ensemble techniques such as boosting and random forests. Finally, you'll learn the importance of model evaluation in supervised learning and study metrics to evaluate regression and classification tasks. By the end of this book, you'll have the skills you need to work on your real-life supervised learning Python projects.What you will learn

Import NumPy and pandas libraries to assess the data in a Jupyter Notebook
Discover patterns within a dataset using exploratory data analysis
Using pandas to find the summary statistics of a dataset
Improve the performance of a model with linear regression analysis
Increase the predictive accuracy with decision trees such as k-nearest neighbor (KNN) models
Plot precision-recall and ROC curves to evaluate model performance

Who this book is forIf you are a beginner or a data scientist who is just getting started and looking to learn how to implement machine learning algorithms to build predicting models, then this book is for you. To expedite the learning process, a solid understanding of Python programming is recommended as you'll be editing the classes or functions instead of creating from scratch.

More details

Other editions

Persons

Blaine Bateman has more than 35 years of experience working with various industries from government R&D to startups to $1B public companies. His experience focuses on analytics including machine learning and forecasting. His hands-on abilities include Python and R coding, Keras/Tensorflow, and AWS & Azure machine learning services. As a machine learning consultant, he has developed and deployed actual ML models in industry. Ashish Ranjan Jha received his bachelor's degree in electrical engineering from IIT Roorkee (India), a master's degree in Computer Science from EPFL (Switzerland), and an MBA degree from Quantic School of Business (Washington). He has received a distinction in all 3 of his degrees. He has worked for large technology companies, including Oracle and Sony as well as the more recent tech unicorns such as Revolut, mostly focused on artificial intelligence. He currently works as a machine learning engineer. Ashish has worked on a range of products and projects, from developing an app that uses sensor data to predict the mode of transport to detecting fraud in car damage insurance claims. Besides being an author, machine learning engineer, and data scientist, he also blogs frequently on his personal blog site about the latest research and engineering topics around machine learning. Benjamin Johnston is a senior data scientist for one of the world's leading data-driven MedTech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his Ph.D. in ML, specializing in image processing and deep convolutional neural networks. He has more than 10 years of experience in medical device design and development, working in a variety of technical roles, and holds a first-class honors bachelor's degree in both engineering and medical science from the University of Sydney, Australia. Ishita Mathur has worked as a data scientist for 2.5 years with product-based start-ups working with business concerns in various domains and formulating them as technical problems that can be solved using data and machine learning. Her current work at GO-JEK involves the end-to-end development of machine learning projects, by working as part of a product team on defining, prototyping, and implementing data science models within the product. She completed her masters' degree in high-performance computing with data science at the University of Edinburgh, UK, and her bachelor's degree with honors in physics at St. Stephen's College, Delhi. Contacted by Melwyn Dsouza on 14/05/22018

Contacted for HTML5 and CSS3 on July 22, 2019 by Sneha Shinde
https://www.udemy.com/making-ionic-mobile-apps-with-ionic-creator/
Sukanya Mandal is a Data Scientist currently working with an MNC and an independent researcher. She takes pleasure in working with Data, getting underneath it, discovering those hidden insights by connecting the dots. An avid believer of opensource data science principles, she is currently contributing to signacore - an open-source project on robotics and reinforcement learning. An author and a blogger, her technological interests and competence lie in the area of Machine Learning, Deep Learning, Natural Language Processing and the Internet of Things. Her favorite language is Python and she has significant experience working on it, and gladly contributed her expertise in reviewing this course!

Content

Table of Contents

Fundamentals of Supervised Learning Algorithms
Exploratory Data Analysis and Visualization
Linear Regression
Autoregression
Classification Techniques
Ensemble Modeling
Model Evaluation

1. Fundamentals of Supervised Learning Algorithms

Overview

This chapter introduces you to supervised learning, using Anaconda to manage coding environments, and using Jupyter notebooks to create, manage, and run code. It also covers some of the most common Python packages used in supervised learning: pandas, NumPy, Matplotlib, and seaborn. By the end of this chapter, you will be able to install and load Python libraries into your development environment for use in analysis and machine learning problems. You will also be able to load an external data source using pandas, and use a variety of methods to search, filter, and compute descriptive statistics of the data. This chapter will enable you to gauge the potential impact of various issues within the data source.

Introduction

The study and application of machine learning and artificial intelligence has recently been the source of much interest and research in the technology and business communities. Advanced data analytics and machine learning techniques have shown great promise in advancing many sectors, such as personalized healthcare and self-driving cars, as well as in solving some of the world's greatest challenges, such as combating climate change (see Tackling Climate Change with Machine Learning: https://packt.live/2SXh8Jo).

This book has been designed to help you to take advantage of the unique confluence of events in the field of data science and machine learning today. Across the globe, private enterprises and governments are realizing the value and efficiency of data-driven products and services. At the same time, reduced hardware costs and open source software solutions are significantly reducing the barriers to entry of learning and applying machine learning techniques.

Here, we will focus on supervised machine learning (or, supervised learning for short). We'll explain the different types of machine learning shortly, but let's begin with some quick information. The now-classic example of supervised learning is developing an algorithm to distinguish between pictures of cats and dogs. The supervised part arises from two aspects; first, we have a set of pictures where we know the correct answers. We call such data labeled data. Second, we carry out a process where we iteratively test our algorithm's ability to predict "cat" or "dog" given pictures, and we make corrections to the algorithm when the predictions are incorrect. This process, at a high level, is similar to teaching children. However, it generally takes a lot more data to train an algorithm than to teach a child to recognize cats and dogs! Fortunately, there are rapidly growing sources of data at our disposal. Note the use of the words learning and train in the context of developing our algorithm. These might seem to be giving human qualities to our machines and computer programs, but they are already deeply ingrained in the machine learning (and artificial intelligence) literature, so let's use them and understand them. Training in our context here always refers to the process of providing labeled data to an algorithm and making adjustments to the algorithm to best predict the labels given the data. Supervised means that the labels for the data are provided within the training, allowing the model to learn from these labels.

Let's now understand the distinction between supervised learning and other forms of machine learning.

When to Use Supervised Learning

Generally, if you are trying to automate or replicate an existing process, the problem is a supervised learning problem. As an example, let's say you are the publisher of a magazine that reviews and ranks hairstyles from various time periods. Your readers frequently send you far more images of their favorite hairstyles for review than you can manually process. To save some time, you would like to automate the sorting of the hairstyle images you receive based on time periods, starting with hairstyles from the 1960s and 1980s, as you can see in the following figure:

Figure 1.1: Images of hairstyles from different time periods

To create your hairstyles-sorting algorithm, you start by collecting a large sample of hairstyle images and manually labeling each one with its corresponding time period. Such a dataset (known as a labeled dataset) is the input data (hairstyle images) for which the desired output information (time period) is known and recorded. This type of problem is a classic supervised learning problem; we are trying to develop an algorithm that takes a set of inputs and learns to return the answers that we have told it are correct.

Python Packages and Modules

Python is one of the most popular programming languages used for machine learning, and is the language used here.

While the standard features that are included in Python are certainly feature-rich, the true power of Python lies in the additional libraries (also known as packages), which, thanks to open source licensing, can be easily downloaded and installed through a few simple commands. In this book, we generally assume your system has been configured using Anaconda, which is an open source environment manager for Python. Depending on your system, you can configure multiple virtual environments using Anaconda, each one configured with specific packages and even different versions of Python. Using Anaconda takes care of many of the requirements to get ready to perform machine learning, as many of the most common packages come pre-built within Anaconda. Refer to the preface for Anaconda installation instructions.

In this book, we will be using the following additional Python packages:

NumPy (pronounced Num Pie and available at https://packt.live/2w1Kn4R): NumPy (short for numerical Python) is one of the core components of scientific computing in Python. NumPy provides the foundational data types from which a number of other data structures derive, including linear algebra, vectors and matrices, and key random number functionality.
SciPy (pronounced Sigh Pie and available at https://packt.live/2w5Wfmm): SciPy, along with NumPy, is a core scientific computing package. SciPy provides a number of statistical tools, signal processing tools, and other functionality, such as Fourier transforms.
pandas (available at https://packt.live/3cc4TAa): pandas is a high-performance library for loading, cleaning, analyzing, and manipulating data structures.
Matplotlib (available at https://packt.live/2TmvKBk): Matplotlib is the foundational Python library for creating graphs and plots of datasets and is also the base package from which other Python plotting libraries derive. The Matplotlib API has been designed in alignment with the Matlab plotting library to facilitate an easy transition to Python.
Seaborn (available at https://packt.live/2VniL4F): Seaborn is a plotting library built on top of Matplotlib, providing attractive color and line styles as well as a number of common plotting templates.
Scikit-learn (available at https://packt.live/2MC1kJ9): Scikit-learn is a Python machine learning library that provides a number of data mining, modeling, and analysis techniques in a simple API. Scikit-learn includes a number of machine learning algorithms out of the box, including classification, regression, and clustering techniques.

These packages form the foundation of a versatile machine learning development environment, with each package contributing a key set of functionalities. As discussed, by using Anaconda, you will already have all of the required packages installed and ready for use. If you require a package that is not included in the Anaconda installation, it can be installed by simply entering and executing the following code in a Jupyter notebook cell:

!conda install <package name>

As an example, if we wanted to install Seaborn, we'd run the following command:

!conda install seaborn

To use one of these packages in a notebook, all we need to do is import it:

import matplotlib

Loading Data in Pandas

pandas has the ability to read and write a number of different file formats and data structures, including CSV, JSON, and HDF5 files, as well as SQL and Python Pickle formats. The pandas input/output documentation can be found at https://packt.live/2FiYB2O. We will continue to look into the pandas functionality by loading data via a CSV file.

Note

The dataset we will be using for this chapter is the Titanic: Machine Learning from Disaster dataset, available from https://packt.live/2wQPBkx.

Alternatively, the dataset is available on our GitHub repository via the following link: https://packt.live/2vjyPK9

The dataset contains a roll of the guests on board the famous ship Titanic, as well as their age, survival status, and number of siblings/parents. Before we get started with loading the data into Python, it is critical that we spend some time looking over the information provided for the dataset so that we can have a thorough understanding of what it contains. Download the dataset and place it in the directory you're working in.

Looking at the description for the data, we can see that we have the following fields available:

survival: This tells us whether a given person survived (0 = No, 1 =...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

The Supervised Learning Workshop

Description

More details

Other editions

Additional editions

Persons

Content

1. Fundamentals of Supervised Learning Algorithms

Introduction

When to Use Supervised Learning

Python Packages and Modules

Loading Data in Pandas

System requirements