
The Unsupervised Learning Workshop
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Learn interesting methods to simplify large amounts of unorganized data
- Tackle real-world challenges, such as estimating the population density of a geographical area
Book DescriptionDo you find it difficult to understand how popular companies like WhatsApp and Amazon find valuable insights from large amounts of unorganized data? The Unsupervised Learning Workshop will give you the confidence to deal with cluttered and unlabeled datasets, using unsupervised algorithms in an easy and interactive manner. The book starts by introducing the most popular clustering algorithms of unsupervised learning. You'll find out how hierarchical clustering differs from k-means, along with understanding how to apply DBSCAN to highly complex and noisy data. Moving ahead, you'll use autoencoders for efficient data encoding. As you progress, you'll use t-SNE models to extract high-dimensional information into a lower dimension for better visualization, in addition to working with topic modeling for implementing natural language processing (NLP). In later chapters, you'll find key relationships between customers and businesses using Market Basket Analysis, before going on to use Hotspot Analysis for estimating the population density of an area. By the end of this book, you'll be equipped with the skills you need to apply unsupervised algorithms on cluttered datasets to find useful patterns and insights.What you will learn - Distinguish between hierarchical clustering and the k-means algorithm
- Understand the process of finding clusters in data
- Grasp interesting techniques to reduce the size of data
- Use autoencoders to decode data
- Extract text from a large collection of documents using topic modeling
- Create a bag-of-words model using the CountVectorizer
Who this book is forIf you are a data scientist who is just getting started and want to learn how to implement machine learning algorithms to build predictive models, then this book is for you. To expedite the learning process, a solid understanding of the Python programming language is recommended, as you'll be editing classes and functions instead of creating them from scratch.
More details
Content
- Cover
- FM
- Copyright
- Table of Contents
- Preface
- Chapter 1: Introduction to Clustering
- Introduction
- Unsupervised Learning versus Supervised Learning
- Clustering
- Identifying Clusters
- Two-Dimensional Data
- Exercise 1.01: Identifying Clusters in Data
- Introduction to k-means Clustering
- No-Math k-means Walkthrough
- K-means Clustering In-Depth Walkthrough
- Alternative Distance Metric - Manhattan Distance
- Deeper Dimensions
- Exercise 1.02: Calculating Euclidean Distance in Python
- Exercise 1.03: Forming Clusters with the Notion of Distance
- Exercise 1.04: K-means from Scratch - Part 1: Data Generation
- Exercise 1.05: K-means from Scratch - Part 2: Implementing k-means
- Clustering Performance - Silhouette Score
- Exercise 1.06: Calculating the Silhouette Score
- Activity 1.01: Implementing k-means Clustering
- Summary
- Chapter 2: Hierarchical Clustering
- Introduction
- Clustering Refresher
- The k-means Refresher
- The Organization of the Hierarchy
- Introduction to Hierarchical Clustering
- Steps to Perform Hierarchical Clustering
- An Example Walkthrough of Hierarchical Clustering
- Exercise 2.01: Building a Hierarchy
- Linkage
- Exercise 2.02: Applying Linkage Criteria
- Agglomerative versus Divisive Clustering
- Exercise 2.03: Implementing Agglomerative Clustering with scikit-learn
- Activity 2.01: Comparing k-means with Hierarchical Clustering
- k-means versus Hierarchical Clustering
- Summary
- Chapter 3: Neighborhood Approaches and DBSCAN
- Introduction
- Clusters as Neighborhoods
- Introduction to DBSCAN
- DBSCAN in Detail
- Walkthrough of the DBSCAN Algorithm
- Exercise 3.01: Evaluating the Impact of Neighborhood Radius Size
- DBSCAN Attributes - Neighborhood Radius
- Activity 3.01: Implementing DBSCAN from Scratch
- DBSCAN Attributes - Minimum Points
- Exercise 3.02: Evaluating the Impact of the Minimum Points Threshold
- Activity 3.02: Comparing DBSCAN with k-means and Hierarchical Clustering
- DBSCAN versus k-means and Hierarchical Clustering
- Summary
- Chapter 4: Dimensionality Reduction Techniques and PCA
- Introduction
- What Is Dimensionality Reduction?
- Applications of Dimensionality Reduction
- The Curse of Dimensionality
- Overview of Dimensionality Reduction Techniques
- Dimensionality Reduction
- Principal Component Analysis
- Mean
- Standard Deviation
- Covariance
- Covariance Matrix
- Exercise 4.01: Computing Mean, Standard Deviation, and Variance Using the pandas Library
- Eigenvalues and Eigenvectors
- Exercise 4.02: Computing Eigenvalues and Eigenvectors
- The Process of PCA
- Exercise 4.03: Manually Executing PCA
- Exercise 4.04: scikit-learn PCA
- Activity 4.01: Manual PCA versus scikit-learn
- Restoring the Compressed Dataset
- Exercise 4.05: Visualizing Variance Reduction with Manual PCA
- Exercise 4.06: Visualizing Variance Reduction with scikit-learn
- Exercise 4.07: Plotting 3D Plots in Matplotlib
- Activity 4.02: PCA Using the Expanded Seeds Dataset
- Summary
- Chapter 5: Autoencoders
- Introduction
- Fundamentals of Artificial Neural Networks
- The Neuron
- The Sigmoid Function
- Rectified Linear Unit (ReLU)
- Exercise 5.01: Modeling the Neurons of an Artificial Neural Network
- Exercise 5.02: Modeling Neurons with the ReLU Activation Function
- Neural Networks: Architecture Definition
- Exercise 5.03: Defining a Keras Model
- Neural Networks: Training
- Exercise 5.04: Training a Keras Neural Network Model
- Activity 5.01: The MNIST Neural Network
- Autoencoders
- Exercise 5.05: Simple Autoencoder
- Activity 5.02: Simple MNIST Autoencoder
- Exercise 5.06: Multi-Layer Autoencoder
- Convolutional Neural Networks
- Exercise 5.07: Convolutional Autoencoder
- Activity 5.03: MNIST Convolutional Autoencoder
- Summary
- Chapter 6: t-Distributed Stochastic Neighbor Embedding
- Introduction
- The MNIST Dataset
- Stochastic Neighbor Embedding (SNE)
- t-Distributed SNE
- Exercise 6.01: t-SNE MNIST
- Activity 6.01: Wine t-SNE
- Interpreting t-SNE Plots
- Perplexity
- Exercise 6.02: t-SNE MNIST and Perplexity
- Activity 6.02: t-SNE Wine and Perplexity
- Iterations
- Exercise 6.03: t-SNE MNIST and Iterations
- Activity 6.03: t-SNE Wine and Iterations
- Final Thoughts on Visualizations
- Summary
- Chapter 7: Topic Modeling
- Introduction
- Topic Models
- Exercise 7.01: Setting up the Environment
- A High-Level Overview of Topic Models
- Business Applications
- Exercise 7.02: Data Loading
- Cleaning Text Data
- Data Cleaning Techniques
- Exercise 7.03: Cleaning Data Step by Step
- Exercise 7.04: Complete Data Cleaning
- Activity 7.01: Loading and Cleaning Twitter Data
- Latent Dirichlet Allocation
- Variational Inference
- Bag of Words
- Exercise 7.05: Creating a Bag-of-Words Model Using the Count Vectorizer
- Perplexity
- Exercise 7.06: Selecting the Number of Topics
- Exercise 7.07: Running LDA
- Visualization
- Exercise 7.08: Visualizing LDA
- Exercise 7.09: Trying Four Topics
- Activity 7.02: LDA and Health Tweets
- Exercise 7.10: Creating a Bag-of-Words Model Using TF-IDF
- Non-Negative Matrix Factorization
- The Frobenius Norm
- The Multiplicative Update Algorithm
- Exercise 7.11: Non-negative Matrix Factorization
- Exercise 7.12: Visualizing NMF
- Activity 7.03: Non-negative Matrix Factorization
- Summary
- Chapter 8: Market Basket Analysis
- Introduction
- Market Basket Analysis
- Use Cases
- Important Probabilistic Metrics
- Exercise 8.01: Creating Sample Transaction Data
- Support
- Confidence
- Lift and Leverage
- Conviction
- Exercise 8.02: Computing Metrics
- Characteristics of Transaction Data
- Exercise 8.03: Loading Data
- Data Cleaning and Formatting
- Exercise 8.04: Data Cleaning and Formatting
- Data Encoding
- Exercise 8.05: Data Encoding
- Activity 8.01: Loading and Preparing Full Online Retail Data
- The Apriori Algorithm
- Computational Fixes
- Exercise 8.06: Executing the Apriori Algorithm
- Activity 8.02: Running the Apriori Algorithm on the Complete Online Retail Dataset
- Association Rules
- Exercise 8.07: Deriving Association Rules
- Activity 8.03: Finding the Association Rules on the Complete Online Retail Dataset
- Summary
- Chapter 9: Hotspot Analysis
- Introduction
- Spatial Statistics
- Probability Density Functions
- Using Hotspot Analysis in Business
- Kernel Density Estimation
- The Bandwidth Value
- Exercise 9.01: The Effect of the Bandwidth Value
- Selecting the Optimal Bandwidth
- Exercise 9.02: Selecting the Optimal Bandwidth Using Grid Search
- Kernel Functions
- Exercise 9.03: The Effect of the Kernel Function
- Kernel Density Estimation Derivation
- Exercise 9.04: Simulating the Derivation of Kernel Density Estimation
- Activity 9.01: Estimating Density in One Dimension
- Hotspot Analysis
- Exercise 9.05: Loading Data and Modeling with Seaborn
- Exercise 9.06: Working with Basemaps
- Activity 9.02: Analyzing Crime in London
- Summary
- Appendix
- Index
- _Hlk27041800
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.