
Practical Weak Supervision
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Most data scientists and engineers today rely on quality labeled data to train machine learning models. But building a training set manually is time-consuming and expensive, leaving many companies with unfinished ML projects. There''s a more practical approach. In this book, Wee Hyong Tok, Amit Bahree, and Senja Filipi show you how to create products using weakly supervised learning models.
You''ll learn how to build natural language processing and computer vision projects using weakly labeled datasets from Snorkel, a spin-off from the Stanford AI Lab. Because so many companies have pursued ML projects that never go beyond their labs, this book also provides a guide on how to ship the deep learning models you build.
- Get up to speed on the field of weak supervision, including ways to use it as part of the data science process
- Use Snorkel AI for weak supervision and data programming
- Get code examples for using Snorkel to label text and image datasets
- Use a weakly labeled dataset for text and image classification
- Learn practical considerations for using Snorkel with large datasets and using Spark clusters to scale labeling
More details
Other editions
Additional editions

Content
- Intro
- Copyright
- Table of Contents
- Foreword by Xuedong Huang
- Foreword by Alex Ratner
- Preface
- Who Should Read This Book
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Chapter 1. Introduction to Weak Supervision
- What Is Weak Supervision?
- Real-World Weak Supervision with Snorkel
- Approaches to Weak Supervision
- Incomplete Supervision
- Inexact Supervision
- Inaccurate Supervision
- Data Programming
- Getting Training Data
- How Data Programming Is Helping Accelerate Software 2.0
- Summary
- Chapter 2. Diving into Data Programming with Snorkel
- Snorkel, a Data Programming Framework
- Getting Started with Labeling Functions
- Applying the Labels to the Datasets
- Analyzing the Labeling Performance
- Using a Validation Set
- Reaching Labeling Consensus with LabelModel
- Intuition Behind LabelModel
- LabelModel Parameter Estimation
- Strategies to Improve the Labeling Functions
- Data Augmentation with Snorkel Transformers
- Data Augmentation Through Word Removal
- Snorkel Preprocessors
- Data Augmentation Through GPT-2 Prediction
- Data Augmentation Through Translation
- Applying the Transformation Functions to the Dataset
- Summary
- Chapter 3. Labeling in Action
- Labeling a Text Dataset: Identifying Fake News
- Exploring the Fake News Detection(FakeNewsNet) Dataset
- Importing Snorkel and Setting Up Representative Constants
- Fact-Checking Sites
- Is the Speaker a "Liar"?
- Twitter Profile and Botometer Score
- Generating Agreements Between Weak Classifiers
- Labeling an Images Dataset: Determining Indoor Versus Outdoor Images
- Creating a Dataset of Images from Bing
- Defining and Training Weak Classifiers in TensorFlow
- Training the Various Classifiers
- Weak Classifiers out of Image Tags
- Deploying the Computer Vision Service
- Interacting with the Computer Vision Service
- Preparing the DataFrame
- Learning a LabelModel
- Summary
- Chapter 4. Using the Snorkel-Labeled Dataset for Text Classification
- Getting Started with Natural Language Processing (NLP)
- Transformers
- Hard Versus Probabilistic Labels
- Using ktrain for Performing Text Classification
- Data Preparation
- Dealing with an Imbalanced Dataset
- Training the Model
- Using the Text Classification Model for Prediction
- Finding a Good Learning Rate
- Using Hugging Face and Transformers
- Loading the Relevant Python Packages
- Dataset Preparation
- Checking Whether GPU Hardware Is Available
- Performing Tokenization
- Model Training
- Testing the Fine-Tuned Model
- Summary
- Chapter 5. Using the Snorkel-Labeled Dataset for Image Classification
- Visual Object Recognition Overview
- Representing Image Features
- Transfer Learning for Computer Vision
- Using PyTorch for Image Classification
- Loading the Indoor/Outdoor Dataset
- Utility Functions
- Visualizing the Training Data
- Fine-Tuning the Pretrained Model
- Summary
- Chapter 6. Scalability and Distributed Training
- The Need for Scalability
- Distributed Training
- Apache Spark: An Introduction
- Spark Application Design
- Using Azure Databricks to Scale
- Cluster Setup for Weak Supervision
- Fake News Detection Dataset on Databricks
- Labeling Functions for Snorkel
- Setting Up Dependencies
- Loading the Data
- Fact-Checking Sites
- Transfer Learning Using the LIAR Dataset
- Weak Classifiers: Generating Agreement
- Type Conversions Needed for Spark Runtime
- Summary
- Index
- About the Authors
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.