Text Analysis Pipelines

Name: Text Analysis Pipelines | Towards Ad-hoc Large-Scale Text Mining
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

Towards Ad-hoc Large-Scale Text Mining

Henning Wachsmuth(Author)

Springer (Publisher)

Published on 2. December 2015

XX, 302 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-25741-9 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Symbols
Contents
1 Introduction
1.1 Information Search in Times of Big Data
1.1.1 Text Mining to the Rescue
1.2 A Need for Efficient and Robust Text Analysis Pipelines
1.2.1 Basic Text Analysis Scenario
1.2.2 Shortcomings of Traditional Text Analysis Pipelines
1.2.3 Problems Approached in This Book
1.3 Towards Intelligent Pipeline Design and Execution
1.3.1 Central Research Question and Method
1.3.2 An Artificial Intelligence Approach
1.4 Contributions and Outline of This Book
1.4.1 New Findings in Ad-Hoc Large-Scale Text Mining
1.4.2 Contributions to the Concerned Research Fields
1.4.3 Structure of the Remaining Chapters
1.4.4 Published Research Within This Book
2 Text Analysis Pipelines
2.1 Foundations of Text Mining
2.1.1 Text Mining
2.1.2 Information Retrieval
2.1.3 Natural Language Processing
2.1.4 Data Mining
2.1.5 Development and Evaluation
2.2 Text Analysis Tasks, Processes, and Pipelines
2.2.1 Text Analysis Tasks
2.2.2 Text Analysis Processes
2.2.3 Text Analysis Pipelines
2.3 Case Studies in This Book
2.3.1 InfexBA -- Information Extraction for Business Applications
2.3.2 ArguAna -- Argumentation Analysis in Customer Opinions
2.3.3 Other Evaluated Text Analysis Tasks
2.4 State of the Art in Ad-Hoc Large-Scale Text Mining
2.4.1 Text Analysis Approaches
2.4.2 Design of Text Analysis Approaches
2.4.3 Efficiency of Text Analysis Approaches
2.4.4 Robustness of Text Analysis Approaches
3 Pipeline Design
3.1 Ideal Construction and Execution for Ad-Hoc Text Mining
3.1.1 The Optimality of Text Analysis Pipelines
3.1.2 Paradigms of Designing Optimal Text Analysis Pipelines
3.1.3 Case Study of Ideal Construction and Execution
3.1.4 Discussion of Ideal Construction and Execution
3.2 A Process-Oriented View of Text Analysis
3.2.1 Text Analysis as an Annotation Task
3.2.2 Modeling the Information to Be Annotated
3.2.3 Modeling the Quality to Be Achieved by the Annotation
3.2.4 Modeling the Analysis to Be Performed for Annotation
3.2.5 Defining an Annotation Task Ontology
3.2.6 Discussion of the Process-Oriented View
3.3 Ad-Hoc Construction via Partial Order Planning
3.3.1 Modeling Algorithm Selection as a Planning Problem
3.3.2 Selecting the Algorithms of a Partially Ordered Pipeline
3.3.3 Linearizing the Partially Ordered Pipeline
3.3.4 Properties of the Proposed Approach
3.3.5 An Expert System for Ad-Hoc Construction
3.3.6 Evaluation of Ad-Hoc Construction
3.3.7 Discussion of Ad-Hoc Construction
3.4 An Information-Oriented View of Text Analysis
3.4.1 Text Analysis as a Filtering Task
3.4.2 Defining the Relevance of Portions of Text
3.4.3 Specifying a Degree of Filtering for Each Relation Type
3.4.4 Modeling Dependencies of the Relevant Information Types
3.4.5 Discussion of the Information-Oriented View
3.5 Optimal Execution via Truth Maintenance
3.5.1 Modeling Input Control as a Truth Maintenance Problem
3.5.2 Filtering the Relevant Portions of Text
3.5.3 Determining the Relevant Portions of Text
3.5.4 Properties of the Proposed Approach
3.5.5 A Software Framework for Optimal Execution
3.5.6 Evaluation of Optimal Execution
3.5.7 Discussion of Optimal Execution
3.6 Trading Efficiency for Effectiveness in Ad-Hoc Text Mining
3.6.1 Integration with Passage Retrieval
3.6.2 Integration with Text Filtering
3.6.3 Implications for Pipeline Efficiency
4 Pipeline Efficiency
4.1 Ideal Scheduling for Large-Scale Text Mining
4.1.1 The Efficiency Potential of Pipeline Scheduling
4.1.2 Computing Optimal Schedules with Dynamic Programming
4.1.3 Properties of the Proposed Solution
4.1.4 Case Study of Ideal Scheduling
4.1.5 Discussion of Ideal Scheduling
4.2 The Impact of Relevant Information in Input Texts
4.2.1 Formal Specification of the Impact
4.2.2 Experimental Analysis of the Impact
4.2.3 Practical Relevance of the Impact
4.2.4 Implications of the Impact
4.3 Optimized Scheduling via Informed Search
4.3.1 Modeling Pipeline Scheduling as a Search Problem
4.3.2 Scheduling Text Analysis Algorithms with k-best A* Search
4.3.3 Properties of the Proposed Approach
4.3.4 Evaluation of Optimized Scheduling
4.3.5 Discussion of Optimized Scheduling
4.4 The Impact of the Heterogeneity of Input Texts
4.4.1 Experimental Analysis of the Impact
4.4.2 Quantification of the Impact
4.4.3 Practical Relevance of the Impact
4.4.4 Implications of the Impact
4.5 Adaptive Scheduling via Self-supervised Online Learning
4.5.1 Modeling Pipeline Scheduling as a Classification Problem
4.5.2 Learning to Predict Run-Times Self-supervised and Online
4.5.3 Adapting a Pipeline's Schedule to the Input Text
4.5.4 Properties of the Proposed Approach
4.5.5 Evaluation of Adaptive Scheduling
4.5.6 Discussion of Adaptive Scheduling
4.6 Parallelizing Execution in Large-Scale Text Mining
4.6.1 Effects of Parallelizing Pipeline Execution
4.6.2 Parallelization of Text Analyses
4.6.3 Parallelization of Text Analysis Pipelines
4.6.4 Implications for Pipeline Robustness
5 Pipeline Robustness
5.1 Ideal Domain Independence for High-Quality Text Mining
5.1.1 The Domain Dependence Problem in Text Analysis
5.1.2 Requirements of Achieving Pipeline Domain Independence
5.1.3 Domain-Independent Features of Argumentative Texts
5.2 A Structure-Oriented View of Text Analysis
5.2.1 Text Analysis as a Structure Classification Task
5.2.2 Modeling the Argumentation and Content of a Text
5.2.3 Modeling the Argumentation Structure of a Text
5.2.4 Defining a Structure Classification Task Ontology
5.2.5 Discussion of the Structure-Oriented View
5.3 The Impact of the Overall Structure of Input Texts
5.3.1 Experimental Analysis of Content and Style Features
5.3.2 Statistical Analysis of the Impact of Task-Specific Structure
5.3.3 Statistical Analysis of the Impact of General Structure
5.3.4 Implications of the Invariance and Impact
5.4 Features for Domain Independence via Supervised Clustering
5.4.1 Approaching Classification as a Relatedness Problem
5.4.2 Learning Overall Structures with Supervised Clustering
5.4.3 Using the Overall Structures as Features for Classification
5.4.4 Properties of the Proposed Features
5.4.5 Evaluation of Features for Domain Independence
5.4.6 Discussion of Features for Domain Independence
5.5 Explaining Results in High-Quality Text Mining
5.5.1 Intelligible Text Analysis through Explanations
5.5.2 Explanation of Arbitrary Text Analysis Processes
5.5.3 Explanation of the Class of an Argumentative Text
5.5.4 Implications for Ad-Hoc Large-Scale Text Mining
6 Conclusion
6.1 Contributions and Open Problems
6.1.1 Enabling Ad-Hoc Text Analysis
6.1.2 Optimally Analyzing Text
6.1.3 Optimizing Analysis Efficiency
6.1.4 Robustly Classifying Text
6.2 Implications and Outlook
6.2.1 Towards Ad-Hoc Large-Scale Text Mining
6.2.2 Outside the Box
Appendix A Text Analysis Algorithms
A.1 Analyses and Algorithms
A.1.1 Classification of Text
A.1.2 Entity Recognition
A.1.3 Normalization and Resolution
A.1.4 Parsing
A.1.5 Relation Extraction and Event Detection
A.1.6 Segmentation
A.1.7 Tagging
A.2 Evaluation Results
A.2.1 Efficiency Results
A.2.2 Effectiveness Results
Appendix B Software
B.1 An Expert System for Ad-hoc Pipeline Construction
B.1.1 Getting Started
B.1.2 Using the Expert System
B.1.3 Exploring the Source Code of the System
B.2 A Software Framework for Optimal Pipeline Execution
B.2.1 Getting Started
B.2.2 Using the Framework
B.2.3 Exploring the Source Code of the Framework
B.3 A Web Application for Sentiment Scoring and Explanation
B.3.1 Getting Started
B.3.2 Using the Application
B.3.3 Exploring the Source Code of the Application
B.3.4 Acknowledgments
B.4 Source Code of All Experiments and Case Studies
B.4.1 Software
B.4.2 Text Corpora
B.4.3 Experiments and Case Studies
Appendix c Text Corpora
C.1 The Revenue Corpus
C.1.1 Compilation
C.1.2 Annotation
C.1.3 Files
C.1.4 Acknowledgments
C.2 The ArguAna TripAdvisor Corpus
C.2.1 Compilation
C.2.2 Annotation
C.2.3 Files
C.2.4 Acknowledgments
C.3 The LFA-11 Corpus
C.3.1 Compilation
C.3.2 Annotation
C.3.3 Files
C.3.4 Acknowledgments
C.4 Used Existing Text corpora
C.4.1 CoNLL-2003 Dataset (English and German)
C.4.2 Sentiment Scale Dataset (and Related Datasets)
C.4.3 Brown Corpus
C.4.4 Wikipedia Sample
References
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Text Analysis Pipelines

Description

More details

Other editions

Additional editions

Content

System requirements