Identity of Long-tail Entities in Text

Name: Identity of Long-tail Entities in Text
Brand: IOS Press,US
Price: 97.99 EUR
Availability: OnlineOnly

Filip Ilievski(Editor)

IOS Press,US

1st Edition

Published on 15. November 2019

220 pages

E-Book

PDF with digital watermarking

System requirements

978-1-64368-043-9 (ISBN)

€97.99incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Title Page
Contents
Acronyms
1 Introduction
1.1 Background: Identity in the digital era
1.2 Challenge: Entity Linking in the long tail
1.3 Research questions
1.4 Approach and structure of the thesis
1.4.1 Describing and observing the head and the tail
1.4.2 Analyzing the evaluation bias on the long tail
1.4.3 Improving the evaluation bias on the long tail
1.4.4 Enabling access to knowledge about long-tail entities beyond DBpedia
1.4.5 The role of knowledge in establishing identity of long-tail entities
1.5 Summary of findings
1.6 Software and data
2 Describing and Observing the Head and the Tail of Entity Linking
2.1 Introduction
2.2 Related work
2.3 Approach
2.3.1 The head-tail phenomena of the entity linking task
2.3.2 Hypotheses on the head-tail phenomena of the entity linking task
2.3.3 Datasets and systems
2.3.4 Evaluation
2.4 Analysis of data properties
2.4.1 Frequency distribution of forms and instances in datasets
2.4.2 PageRank distribution of instances in datasets
2.4.3 Ambiguity distribution of forms
2.4.4 Variance distribution of instances
2.4.5 Interaction between frequency, PageRank, and ambiguity/variance
2.4.6 Frequency distribution for a single form or an instance
2.5 Analysis of system performance and data properties
2.5.1 Correlating system performance with form ambiguity
2.5.2 Correlating system performance with form frequency, instance frequency, and PageRank
2.5.3 Correlating system performance with ambiguity and frequency of forms jointly
2.5.4 Correlating system performance with frequency of instances for ambiguous forms
2.6 Summary of findings
2.7 Recommended actions
2.8 Conclusions
3 Analyzing the Evaluation bias on the Long Tail of Disambiguation & Reference
3.1 Introduction
3.2 Temporal aspect of the disambiguation task
3.3 Related work
3.4 Preliminary study of EL evaluation datasets
3.4.1 Datasets
3.4.2 Dataset characteristics
3.4.3 Distributions of instances and surface forms
3.4.4 Discussion and roadmap
3.5 Semiotic generation and context model
3.6 Methodology
3.6.1 Metrics
3.6.2 Tasks
3.6.3 Datasets
3.7 Analysis
3.8 Proposal for improving evaluation
3.9 Conclusions
4 Improving the Evaluation bias on the Long Tail of Disambiguation & Reference
4.1 Introduction
4.2 Motivation & target communities
4.2.1 Disambiguation & reference
4.2.2 Reading Comprehension & Question Answering
4.2.3 Moving away from semantic overfitting
4.3 Task requirements
4.4 Methods for creating an event-based task
4.4.1 State of text-to-data datasets
4.4.2 From data to text
4.5 Data & resources
4.5.1 Structured data
4.5.2 Example document
4.5.3 Licensing & availability
4.6 Task design
4.6.1 Subtasks
4.6.2 Question template
4.6.3 Question creation
4.6.4 Data partitioning
4.7 Mention annotation
4.7.1 Annotation task and guidelines
4.7.2 Annotation environment
4.7.3 Annotation process
4.7.4 Corpus description
4.8 Evaluation
4.8.1 Criteria
4.8.2 Baselines
4.9 Participants
4.10 Results
4.10.1 Incident-level evaluation
4.10.2 Document-level evaluation
4.10.3 Mention-level evaluation
4.11 Discussion
4.12 Conclusions
5 Enabling Access to Knowledge on the Long-Tail Entities beyond DBpedia
5.1 Introduction
5.2 Problem description
5.2.1 Requirements
5.2.2 Current state-of-the-art
5.3 Related work
5.4 Access to entities at LOD scale with LOD Lab
5.4.1 LOD Lab
5.4.2 APIs and tools
5.5 LOTUS
5.5.1 Model
5.5.2 Language tags
5.5.3 Linguistic entry point to the LOD Cloud
5.5.4 Retrieval
5.6 Implementation
5.6.1 System architecture
5.6.2 Implementation of the matching and ranking algorithms
5.6.3 Distributed architecture
5.6.4 API
5.6.5 Examples
5.7 Performance statistics and flexibility of retrieval
5.7.1 Performance statistics
5.7.2 Flexibility of retrieval
5.8 Finding entities beyond DBpedia
5.8.1 AIDA-YAGO2
5.8.2 Local monuments guided walks
5.8.3 Scientific journals
5.9 Discussion and conclusions
6 The Role of Knowledge in Establishing Identity of Long-Tail Entities
6.1 Introduction
6.2 Related work
6.2.1 Entity Linking and NIL clustering
6.2.2 Attribute extraction
6.2.3 Knowledge Base Completion (KBC)
6.2.4 Other knowledge completion variants
6.3 Task and hypotheses
6.3.1 The NIL clustering task
6.3.2 Research question and hypotheses
6.4 Profiling
6.4.1 Aspects of profiles
6.4.2 Examples
6.4.3 Definition of a profile
6.4.4 Neural methods for profiling
6.5 Experimental setup
6.5.1 End-to-end pipeline
6.5.2 Data
6.5.3 Evaluation
6.5.4 Automatic attribute extraction
6.5.5 Reasoners
6.6 Extrinsic evaluation
6.6.1 Using explicit information to establish identity
6.6.2 Profiling implicit information
6.6.3 Analysis of ambiguity
6.7 Intrinsic analysis of the profiler
6.7.1 Comparison against factual data
6.7.2 Comparison against human expectations
6.8 Discussion and limitations
6.8.1 Summary of the results
6.8.2 Harmonizing knowledge between text and knowledge bases
6.8.3 Limitations of profiling by NNs
6.9 Conclusions and future work
7 Conclusion
7.1 Summarizing our results
7.1.1 Describing and observing the head and the tail of Entity Linking
7.1.2 Analyzing the evaluation bias on the long tail
7.1.3 Improving the evaluation on the long tail
7.1.4 Enabling access to knowledge on the long-tail entities
7.1.5 The role of knowledge in establishing identity of long-tail entities
7.2 Lessons learned
7.2.1 Observations
7.2.2 Recommendations
7.3 Future research directions
7.3.1 Engineering of systems
7.3.2 Novel tasks
7.3.3 A broader vision for the long tail
Bibliography
Colophon

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Identity of Long-tail Entities in Text

Description

More details

Other editions

Additional editions

Content

System requirements