
Identity of Long-tail Entities in Text
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Computational systems developed to establish identity in text often struggle with long-tail cases. This book investigates how Natural Language Processing (NLP) techniques for establishing the identity of long-tail entities - which are all infrequent in communication, hardly represented in knowledge bases, and potentially very ambiguous - can be improved through the use of background knowledge. Topics covered include: distinguishing tail entities from head entities; assessing whether current evaluation datasets and metrics are representative for long-tail cases; improving evaluation of long-tail cases; accessing and enriching knowledge on long-tail entities in the Linked Open Data cloud; and investigating the added value of background knowledge ("profiling") models for establishing the identity of NIL entities.
Providing novel insights into an under-explored and difficult NLP challenge, the book will be of interest to all those working in the field of entity identification in text.
More details
Other editions
Additional editions
Content
- Intro
- Title Page
- Contents
- Acronyms
- 1 Introduction
- 1.1 Background: Identity in the digital era
- 1.2 Challenge: Entity Linking in the long tail
- 1.3 Research questions
- 1.4 Approach and structure of the thesis
- 1.4.1 Describing and observing the head and the tail
- 1.4.2 Analyzing the evaluation bias on the long tail
- 1.4.3 Improving the evaluation bias on the long tail
- 1.4.4 Enabling access to knowledge about long-tail entities beyond DBpedia
- 1.4.5 The role of knowledge in establishing identity of long-tail entities
- 1.5 Summary of findings
- 1.6 Software and data
- 2 Describing and Observing the Head and the Tail of Entity Linking
- 2.1 Introduction
- 2.2 Related work
- 2.3 Approach
- 2.3.1 The head-tail phenomena of the entity linking task
- 2.3.2 Hypotheses on the head-tail phenomena of the entity linking task
- 2.3.3 Datasets and systems
- 2.3.4 Evaluation
- 2.4 Analysis of data properties
- 2.4.1 Frequency distribution of forms and instances in datasets
- 2.4.2 PageRank distribution of instances in datasets
- 2.4.3 Ambiguity distribution of forms
- 2.4.4 Variance distribution of instances
- 2.4.5 Interaction between frequency, PageRank, and ambiguity/variance
- 2.4.6 Frequency distribution for a single form or an instance
- 2.5 Analysis of system performance and data properties
- 2.5.1 Correlating system performance with form ambiguity
- 2.5.2 Correlating system performance with form frequency, instance frequency, and PageRank
- 2.5.3 Correlating system performance with ambiguity and frequency of forms jointly
- 2.5.4 Correlating system performance with frequency of instances for ambiguous forms
- 2.6 Summary of findings
- 2.7 Recommended actions
- 2.8 Conclusions
- 3 Analyzing the Evaluation bias on the Long Tail of Disambiguation & Reference
- 3.1 Introduction
- 3.2 Temporal aspect of the disambiguation task
- 3.3 Related work
- 3.4 Preliminary study of EL evaluation datasets
- 3.4.1 Datasets
- 3.4.2 Dataset characteristics
- 3.4.3 Distributions of instances and surface forms
- 3.4.4 Discussion and roadmap
- 3.5 Semiotic generation and context model
- 3.6 Methodology
- 3.6.1 Metrics
- 3.6.2 Tasks
- 3.6.3 Datasets
- 3.7 Analysis
- 3.8 Proposal for improving evaluation
- 3.9 Conclusions
- 4 Improving the Evaluation bias on the Long Tail of Disambiguation & Reference
- 4.1 Introduction
- 4.2 Motivation & target communities
- 4.2.1 Disambiguation & reference
- 4.2.2 Reading Comprehension & Question Answering
- 4.2.3 Moving away from semantic overfitting
- 4.3 Task requirements
- 4.4 Methods for creating an event-based task
- 4.4.1 State of text-to-data datasets
- 4.4.2 From data to text
- 4.5 Data & resources
- 4.5.1 Structured data
- 4.5.2 Example document
- 4.5.3 Licensing & availability
- 4.6 Task design
- 4.6.1 Subtasks
- 4.6.2 Question template
- 4.6.3 Question creation
- 4.6.4 Data partitioning
- 4.7 Mention annotation
- 4.7.1 Annotation task and guidelines
- 4.7.2 Annotation environment
- 4.7.3 Annotation process
- 4.7.4 Corpus description
- 4.8 Evaluation
- 4.8.1 Criteria
- 4.8.2 Baselines
- 4.9 Participants
- 4.10 Results
- 4.10.1 Incident-level evaluation
- 4.10.2 Document-level evaluation
- 4.10.3 Mention-level evaluation
- 4.11 Discussion
- 4.12 Conclusions
- 5 Enabling Access to Knowledge on the Long-Tail Entities beyond DBpedia
- 5.1 Introduction
- 5.2 Problem description
- 5.2.1 Requirements
- 5.2.2 Current state-of-the-art
- 5.3 Related work
- 5.4 Access to entities at LOD scale with LOD Lab
- 5.4.1 LOD Lab
- 5.4.2 APIs and tools
- 5.5 LOTUS
- 5.5.1 Model
- 5.5.2 Language tags
- 5.5.3 Linguistic entry point to the LOD Cloud
- 5.5.4 Retrieval
- 5.6 Implementation
- 5.6.1 System architecture
- 5.6.2 Implementation of the matching and ranking algorithms
- 5.6.3 Distributed architecture
- 5.6.4 API
- 5.6.5 Examples
- 5.7 Performance statistics and flexibility of retrieval
- 5.7.1 Performance statistics
- 5.7.2 Flexibility of retrieval
- 5.8 Finding entities beyond DBpedia
- 5.8.1 AIDA-YAGO2
- 5.8.2 Local monuments guided walks
- 5.8.3 Scientific journals
- 5.9 Discussion and conclusions
- 6 The Role of Knowledge in Establishing Identity of Long-Tail Entities
- 6.1 Introduction
- 6.2 Related work
- 6.2.1 Entity Linking and NIL clustering
- 6.2.2 Attribute extraction
- 6.2.3 Knowledge Base Completion (KBC)
- 6.2.4 Other knowledge completion variants
- 6.3 Task and hypotheses
- 6.3.1 The NIL clustering task
- 6.3.2 Research question and hypotheses
- 6.4 Profiling
- 6.4.1 Aspects of profiles
- 6.4.2 Examples
- 6.4.3 Definition of a profile
- 6.4.4 Neural methods for profiling
- 6.5 Experimental setup
- 6.5.1 End-to-end pipeline
- 6.5.2 Data
- 6.5.3 Evaluation
- 6.5.4 Automatic attribute extraction
- 6.5.5 Reasoners
- 6.6 Extrinsic evaluation
- 6.6.1 Using explicit information to establish identity
- 6.6.2 Profiling implicit information
- 6.6.3 Analysis of ambiguity
- 6.7 Intrinsic analysis of the profiler
- 6.7.1 Comparison against factual data
- 6.7.2 Comparison against human expectations
- 6.8 Discussion and limitations
- 6.8.1 Summary of the results
- 6.8.2 Harmonizing knowledge between text and knowledge bases
- 6.8.3 Limitations of profiling by NNs
- 6.9 Conclusions and future work
- 7 Conclusion
- 7.1 Summarizing our results
- 7.1.1 Describing and observing the head and the tail of Entity Linking
- 7.1.2 Analyzing the evaluation bias on the long tail
- 7.1.3 Improving the evaluation on the long tail
- 7.1.4 Enabling access to knowledge on the long-tail entities
- 7.1.5 The role of knowledge in establishing identity of long-tail entities
- 7.2 Lessons learned
- 7.2.1 Observations
- 7.2.2 Recommendations
- 7.3 Future research directions
- 7.3.1 Engineering of systems
- 7.3.2 Novel tasks
- 7.3.3 A broader vision for the long tail
- Bibliography
- Colophon
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.