
The Unsupervised Learning Workshop
Get started with unsupervised learning algorithms and simplify your unorganized data to help make future predictions
Packt Publishing
Published on 29. July 2020
Book
Paperback/Softback
550 pages
978-1-80020-070-8 (ISBN)
Description
Learning how to apply unsupervised algorithms on unlabeled datasets from scratch can be easier than you thought with this beginner's workshop, featuring interesting examples and activities
Key Features
Get familiar with the ecosystem of unsupervised algorithms
Learn interesting methods to simplify large amounts of unorganized data
Tackle real-world challenges, such as estimating the population density of a geographical area
Book DescriptionDo you find it difficult to understand how popular companies like WhatsApp and Amazon find valuable insights from large amounts of unorganized data? The Unsupervised Learning Workshop will give you the confidence to deal with cluttered and unlabeled datasets, using unsupervised algorithms in an easy and interactive manner.
The book starts by introducing the most popular clustering algorithms of unsupervised learning. You'll find out how hierarchical clustering differs from k-means, along with understanding how to apply DBSCAN to highly complex and noisy data. Moving ahead, you'll use autoencoders for efficient data encoding.
As you progress, you'll use t-SNE models to extract high-dimensional information into a lower dimension for better visualization, in addition to working with topic modeling for implementing natural language processing (NLP). In later chapters, you'll find key relationships between customers and businesses using Market Basket Analysis, before going on to use Hotspot Analysis for estimating the population density of an area.
By the end of this book, you'll be equipped with the skills you need to apply unsupervised algorithms on cluttered datasets to find useful patterns and insights.
What you will learn
Distinguish between hierarchical clustering and the k-means algorithm
Understand the process of finding clusters in data
Grasp interesting techniques to reduce the size of data
Use autoencoders to decode data
Extract text from a large collection of documents using topic modeling
Create a bag-of-words model using the CountVectorizer
Who this book is forIf you are a data scientist who is just getting started and want to learn how to implement machine learning algorithms to build predictive models, then this book is for you. To expedite the learning process, a solid understanding of the Python programming language is recommended, as you'll be editing classes and functions instead of creating them from scratch.
Key Features
Get familiar with the ecosystem of unsupervised algorithms
Learn interesting methods to simplify large amounts of unorganized data
Tackle real-world challenges, such as estimating the population density of a geographical area
Book DescriptionDo you find it difficult to understand how popular companies like WhatsApp and Amazon find valuable insights from large amounts of unorganized data? The Unsupervised Learning Workshop will give you the confidence to deal with cluttered and unlabeled datasets, using unsupervised algorithms in an easy and interactive manner.
The book starts by introducing the most popular clustering algorithms of unsupervised learning. You'll find out how hierarchical clustering differs from k-means, along with understanding how to apply DBSCAN to highly complex and noisy data. Moving ahead, you'll use autoencoders for efficient data encoding.
As you progress, you'll use t-SNE models to extract high-dimensional information into a lower dimension for better visualization, in addition to working with topic modeling for implementing natural language processing (NLP). In later chapters, you'll find key relationships between customers and businesses using Market Basket Analysis, before going on to use Hotspot Analysis for estimating the population density of an area.
By the end of this book, you'll be equipped with the skills you need to apply unsupervised algorithms on cluttered datasets to find useful patterns and insights.
What you will learn
Distinguish between hierarchical clustering and the k-means algorithm
Understand the process of finding clusters in data
Grasp interesting techniques to reduce the size of data
Use autoencoders to decode data
Extract text from a large collection of documents using topic modeling
Create a bag-of-words model using the CountVectorizer
Who this book is forIf you are a data scientist who is just getting started and want to learn how to implement machine learning algorithms to build predictive models, then this book is for you. To expedite the learning process, a solid understanding of the Python programming language is recommended, as you'll be editing classes and functions instead of creating them from scratch.
More details
Language
English
Place of publication
Birmingham
United Kingdom
Target group
Professional and scholarly
Dimensions
Height: 235 mm
Width: 191 mm
Thickness: 29 mm
Weight
1014 gr
ISBN-13
978-1-80020-070-8 (9781800200708)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Classification
Persons
Aaron Jones is a full-time senior data scientist and consultant. He has built models and data products while working in retail, media, and environmental science. Aaron is based in Seattle, Washington and has a particular interest in clustering algorithms, natural language processing, and Bayesian statistics.
Christopher Kruger is a practicing data scientist and AI researcher. He has managed applied machine learning projects across multiple industries while mentoring junior team members on best practices. His primary focus is on pushing both business practicality as well as academic rigor in every project. Chris is currently developing research in the computer vision space.
Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
Christopher Kruger is a practicing data scientist and AI researcher. He has managed applied machine learning projects across multiple industries while mentoring junior team members on best practices. His primary focus is on pushing both business practicality as well as academic rigor in every project. Chris is currently developing research in the computer vision space.
Benjamin Johnston is a senior data scientist for one of the world's leading data-driven medtech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his PhD in machine learning, specializing in image processing and deep convolutional neural networks. He has more than 10 years' experience in medical device design and development, working in a variety of technical roles, and holds first-class honors bachelor's degrees in both engineering and medical science from the University of Sydney, Australia.
Content
Table of Contents
Introduction to Clustering
Hierarchical Clustering
Neighborhood Approaches and DBSCAN
Dimensionality Reduction Techniques and PCA
Autoencoders
t-Distributed Stochastic Neighbor Embedding
Topic Modeling
Market Basket Analysis
Hotspot Analysis
Introduction to Clustering
Hierarchical Clustering
Neighborhood Approaches and DBSCAN
Dimensionality Reduction Techniques and PCA
Autoencoders
t-Distributed Stochastic Neighbor Embedding
Topic Modeling
Market Basket Analysis
Hotspot Analysis