Natural Language Annotation for Machine Learning

Name: Natural Language Annotation for Machine Learning | A Guide to Corpus-Building for Applications
Brand: O'Reilly
Price: 27.99 EUR
Availability: OnlineOnly

A Guide to Corpus-Building for Applications

James Pustejovsky(Author)

O'Reilly (Publisher)

Published on 11. October 2012

342 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-4493-5976-8 (ISBN)

€27.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Copyright
Table of Contents
Preface
Natural Language Annotation for Machine Learning
Audience
Organization of This Book
Software Requirements
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
James Adds:
Amber Adds:
Chapter 1. The Basics
The Importance of Language Annotation
The Layers of Linguistic Description
What Is Natural Language Processing?
A Brief History of Corpus Linguistics
What Is a Corpus?
Early Use of Corpora
Corpora Today
Kinds of Annotation
Language Data and Machine Learning
Classification
Clustering
Structured Pattern Induction
The Annotation Development Cycle
Model the Phenomenon
Annotate with the Specification
Train and Test the Algorithms over the Corpus
Evaluate the Results
Revise the Model and Algorithms
Summary
Chapter 2. Defining Your Goal and Dataset
Defining Your Goal
The Statement of Purpose
Refining Your Goal: Informativity Versus Correctness
Background Research
Language Resources
Organizations and Conferences
NLP Challenges
Assembling Your Dataset
The Ideal Corpus: Representative and Balanced
Collecting Data from the Internet
Eliciting Data from People
The Size of Your Corpus
Existing Corpora
Distributions Within Corpora
Summary
Chapter 3. Corpus Analytics
Basic Probability for Corpus Analytics
Joint Probability Distributions
Bayes Rule
Counting Occurrences
Zipf's Law
N-grams
Language Models
Summary
Chapter 4. Building Your Model and Specification
Some Example Models and Specs
Film Genre Classification
Adding Named Entities
Semantic Roles
Adopting (or Not Adopting) Existing Models
Creating Your Own Model and Specification: Generality Versus Specificity
Using Existing Models and Specifications
Using Models Without Specifications
Different Kinds of Standards
ISO Standards
Community-Driven Standards
Other Standards Affecting Annotation
Summary
Chapter 5. Applying and Adopting Annotation Standards
Metadata Annotation: Document Classification
Unique Labels: Movie Reviews
Multiple Labels: Film Genres
Text Extent Annotation: Named Entities
Inline Annotation
Stand-off Annotation by Tokens
Stand-off Annotation by Character Location
Linked Extent Annotation: Semantic Roles
ISO Standards and You
Summary
Chapter 6. Annotation and Adjudication
The Infrastructure of an Annotation Project
Specification Versus Guidelines
Be Prepared to Revise
Preparing Your Data for Annotation
Metadata
Preprocessed Data
Splitting Up the Files for Annotation
Writing the Annotation Guidelines
Example 1: Single Labels-Movie Reviews
Example 2: Multiple Labels-Film Genres
Example 3: Extent Annotations-Named Entities
Example 4: Link Tags-Semantic Roles
Annotators
Choosing an Annotation Environment
Evaluating the Annotations
Cohen's Kappa (?)
Fleiss's Kappa (?)
Interpreting Kappa Coefficients
Calculating ? in Other Contexts
Creating the Gold Standard (Adjudication)
Summary
Chapter 7. Training: Machine Learning
What Is Learning?
Defining Our Learning Task
Classifier Algorithms
Decision Tree Learning
Gender Identification
Naïve Bayes Learning
Maximum Entropy Classifiers
Other Classifiers to Know About
Sequence Induction Algorithms
Clustering and Unsupervised Learning
Semi-Supervised Learning
Matching Annotation to Algorithms
Summary
Chapter 8. Testing and Evaluation
Testing Your Algorithm
Evaluating Your Algorithm
Confusion Matrices
Calculating Evaluation Scores
Interpreting Evaluation Scores
Problems That Can Affect Evaluation
Dataset Is Too Small
Algorithm Fits the Development Data Too Well
Too Much Information in the Annotation
Final Testing Scores
Summary
Chapter 9. Revising and Reporting
Revising Your Project
Corpus Distributions and Content
Model and Specification
Annotation
Training and Testing
Reporting About Your Work
About Your Corpus
About Your Model and Specifications
About Your Annotation Task and Annotators
About Your ML Algorithm
About Your Revisions
Summary
Chapter 10. Annotation: TimeML
The Goal of TimeML
Related Research
Building the Corpus
Model: Preliminary Specifications
Times
Signals
Events
Links
Annotation: First Attempts
Model: The TimeML Specification Used in TimeBank
Time Expressions
Events
Signals
Links
Confidence
Annotation: The Creation of TimeBank
TimeML Becomes ISO-TimeML
Modeling the Future: Directions for TimeML
Narrative Containers
Expanding TimeML to Other Domains
Event Structures
Summary
Chapter 11. Automatic Annotation: Generating TimeML
The TARSQI Components
GUTime: Temporal Marker Identification
EVITA: Event Recognition and Classification
GUTenLINK
Slinket
SputLink
Machine Learning in the TARSQI Components
Improvements to the TTK
Structural Changes
Improvements to Temporal Entity Recognition: BTime
Temporal Relation Identification
Temporal Relation Validation
Temporal Relation Visualization
TimeML Challenges: TempEval-2
TempEval-2: System Summaries
Overview of Results
Future of the TTK
New Input Formats
Narrative Containers/Narrative Times
Medical Documents
Cross-Document Analysis
Summary
Chapter 12. Afterword: The Future of Annotation
Crowdsourcing Annotation
Amazon's Mechanical Turk
Games with a Purpose (GWAP)
User-Generated Content
Handling Big Data
Boosting
Active Learning
Semi-Supervised Learning
NLP Online and in the Cloud
Distributed Computing
Shared Language Resources
Shared Language Applications
And Finally...
Appendix A. List of Available Corpora and Specifications
Corpora
Specifications, Guidelines, and Other Resources
Representation Standards
Appendix B. List of Software Resources
Annotation and Adjudication Software
Multipurpose Tools
Corpus Creation and Exploration Tools
Manual Annotation Tools
Automated Annotation Tools
Machine Learning Resources
Appendix C. MAE User Guide
Installing and Running MAE
Loading Tasks and Files
Loading a Task
Loading a File
Annotating Entities
Annotating Links
Deleting Tags
Saving Files
Defining Your Own Task
Task Name
Elements (a.k.a. Tags)
Attributes
Frequently Asked Questions
Appendix D. MAI User Guide
Installing and Running MAI
Loading Tasks and Files
Loading a Task
Loading Files
Adjudicating
The MAI Window
Adjudicating a Tag
Extent Tags
Link Tags
Nonconsuming Tags
Adding New Tags
Deleting tags
Saving Files
Appendix E. Bibliography
References for Using Amazon's Mechanical Turk/Crowdsourcing
Index
About the Authors

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Natural Language Annotation for Machine Learning

Description

More details

Other editions

Additional editions

Content

System requirements