Human Language Technology. Challenges for Computer Science and Linguistics

Name: Human Language Technology. Challenges for Computer Science and Linguistics | 4th Language and Technology Conference, LTC 2009, Roznan, Poland, November 6-8, 2009, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

4th Language and Technology Conference, LTC 2009, Roznan, Poland, November 6-8, 2009, Revised Selected Papers

Zygmunt Vetulani(Herausgeber*in)

Springer (Verlag)

Erschienen am 22. März 2011

XIX, 578 Seiten

E-Book

PDF mit Wasserzeichen-DRM

Systemvoraussetzungen

978-3-642-20095-3 (ISBN)

53,49 €inkl. 7% MwSt.

Systemvoraussetzungen

für PDF mit Wasserzeichen-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Title Page
Preface
Organization
Table of Contents
Speech Processing
Data-Driven Approaches to Objective Evaluation of Phoneme Alignment Systems
Introduction
Experiments
Data
Front-End Processing
HMM Training
Differences between Systems
Non-parametric Ranking of Variances
Parametric Bayesian Models
Results
Non-parametric Ranking Results
Parametric BayesianResults
Discussion
Conclusions
References
Phonetically Transcribed Speech Corpus Designed for Context Based European Portuguese TTS
Introduction
Methodology
Description
The Speech Corpus by Graphemes
Phonetic Transcription by Rules
Phonetic Transcription with Vocalic Reduction
The Vocalic Reduction Influence
Conclusions
References
Robust Speech Recognition in the Car Environment
Introduction
Background
Spectral Subtraction
Weighted Finite State Transducers in Speech Recognition
Adaptation of the Acoustic Model HMM
Experimental Conditions
Drivers Japanese Speech Corpus in Car Environment
WFST Network Construction
Evaluation of the Baseline Model
Evaluation of Nonlinear Spectral Subtraction
Speaker Adaptation of Acoustic Model
Conclusions
References
Corpus Design for a Unit Selection TtS System with Application to Bulgarian
Introduction
Unit Selection Text to Speech
Unit Selection Module
Spoken Corpus for TTS
Corpus Design Strategies
Utterance Selection Methods
The Proposed Corpus Selection Method
Results
Experimental Evaluation
Discussion
References
Automatic Identification of Phonetic Similarity Based on Underspecification
Introduction
Speech Recognition System and Corpus
Experiment
Results
General Information
Analysis of Underspecified System
Phone [f]
Common Phonetic Properties of [f] and [th]
Phonetic Neighbourhood
Frequency of Occurrence
Discussion
Conclusion
References
Error Detection in Broadcast News ASR Using Markov Chains
Introduction
Features for Error Detection
Models for Error Detection
Maximum Entropy Models
Markov Chains
Gaussian Mixture Models
Corpus
Experiments
Evaluation
Automatic Transcription
Error Detection Results
Result Analysis
Impact of the Transition Probability Matrix
Summary and Future Work
References
Pronunciation and Writing Variants in an Under-Resourced Language: The Case of Luxembourgish Mobile N-Deletion
Introduction
The Study of Written and Spoken Variants
Text Normalization for Written Variants
Pronunciation Modeling of Spoken Variants
Effects of Luxembourgish MND on Written and Pronunciation Variants
The Current Study
Data Collection
Characterizing Potential Mobile -N Sites
MND in Transcriptions
MND and Word List Coverage
Summary and Prospects
References
Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition
Introduction
Language Modeling
Factored Language Modeling
The Morphology of Amharic
The Baseline Speech Recognition System
Speech and Text Corpus
The Acoustic and Language Models
Performance of the Baseline System
Morpheme-Based and Factored Language Models
Morpheme-Based Language Models
Amharic Factored Language Models
Lattice Rescoring Experiment
Lattice Rescoring with Morpheme-Based Language Models
Lattice Rescoring with Factored Language Models
Conclusion
References
The Corpus Analysis Toolkit - Analysing Multilevel Annotations
Introduction
Linguistic Annotation
Multilevel Annotations
Tiers, Intervals, and Points
Inter-tier Analysis
Inter-annotation Analysis
The Corpus Analysis Toolkit
An Integrated Analytical Framework
Supported Formats
Internal Representation
Toolkit Inventory
Analysing a Corpus
Corpus Overview
Acquiring General Corpus Information
Interval-Based Corpus Analysis
Investigation of Temporal Inclusion
Extracting Multilevel Representations
The Toolkit and Emerging Standards
Future Work
Concluding Remarks
References
Computational Morphology/Lexicography
Time Durations of Phonemes in Polish Language for Speech and Speaker Recognition
Introduction
Phoneme Segmentation
Experimental Data
Statistics Collection
Results
Conclusions
References
Polysemous Verb Classification Using Subcategorization Acquisition and Graph-Based Clustering
Introduction
Related Work
Japanese Verb Description
Selectional Preferences
Link Analysis
Distributional Similarity
Clustering Method
Experiments
Data and Evaluation
Results
Conclusion
References
Estimating the Proximity between Languages by Their Commonality in Vocabulary Structures
Introduction
Basics of the Comparative Method
Linguistic Specifications
Formal Specifications
Recent Works on Vocabulary Structure
Analogy in Morphology
A Measure of Similarity between Vocabulary Structures
Experiments and Results
Languages and Purpose of the Experiments
Experiments with Swadesh Lists
Experiments with a Multilingual Lexicon Extracted from the Acquis Communautaire
Conclusion
References
Toposlaw - A Lexicographic Framework for Multi-word Units
Introduction
Objects in $Toposlaw$
Morphology and Variants of a Name
Morphological Description of Components
Inflection Graphs
Graph Management
Filtering Graphs
Tracing Paths in a Graph
New Graphs
Dictionary Management
Conclusions and Perspectives
References
Parsing
Parsing CFGs and PCFGs with a Chomsky-Sch¨utzenberger Representation
Introduction
Preliminaries
Notation
The Chomsky-Schützenberger Theorem
An Encoding for CFGs
Parsing with C-S Representations
The Algorithm
Analysis
Weights and PCFGs
An Example
Conclusion and Further Work
References
Syntactic Analysis Using Finite Patterns: A New Parsing System for Czech
Introduction
Current Approaches to Czech Language Parsing
SET - Syntactic Analysis as Pattern Matching Linking Rules
The Parsing Algorithm
The Pattern Definitions
The SET Parser
Experiments and Preliminary Results
Testing Data
Conclusions
References
Using SRX Standard for Sentence Segmentation
Introduction
SRX Standard
Disambiguation Strategies
Results for English and Polish
Conclusions
References
Using Lexicon-Grammar Tables for French Verbs in a Large-Coverage Parser
Introduction
The Verbal Lexicon lglex
The Le$fff$ Syntactic Lexicon and the Alexina Format
Conversion of the Verbal Lexicon $lglex$ into a Lexicon in the Alexina Format
Sketch of the Conversion Process
Resulting Lexicon
Integration in the frmg Parser
Evaluation and Discussion
Conclusion and Future Work
References
Computational Semantics
Effect of Overt Pronoun Resolution in Topic Tracking
Introduction
Related Work
Pronoun Resolution
Pre-processing
Identification of Antecedent
Tracking Based on Term Weighting and Adaptation
Experiments
Anaphora Resolution
Topic Tracking
Conclusion
References
Sentiment Intensity: Is It a Good Summary Indicator?
Introduction
Related Work
Sentiment Intensity and Summarisation
Sentiment Annotated Corpus
Sentiment Analysis
Summarisation Algorithm Based on Sentiment Intensity
Hypothesis Test for Sentiment Intensity Usefulness
Experimental Results
Sentiment Analysis
Summarisation
Conclusions
References
Syntactic Tree Kernels for Event-Time Temporal Relation Learning
Introduction
Previous Works
Pattern Based Methods
Rule Based Methods
Anchor Based Methods
Syntactic Tree Kernels in SVM
Simple Event-Time Kernel
Tree Kernels
Composite Kernels
Corpus Description
Experiments
Conclusion
References
The WSD Development Environment
Introduction
WSD Development Environment
Corpora
Feature Generation
Feature Selection
Machine Learning Algorithms
Runtime
Reports
Experiments
Conclusions
References
Semantic Analyzer in the Thetos-3 System
Semantic Processing - General Premises
Predicate-Argument Structure
Semantic Interpretation of SGs
Semantic Relations
Facial Expressions and Gestures
Requests
Conclusion
References
Unsupervised and Open Ontology-Based Semantic Analysis
Introduction
Motivation, Theory and Practice
The a-Grammar, a Pattern-Based Grammar
A Tree-Like Representation Conversion
A Compositional Analysis
Examples
Ontology-Based Semantic Analysis
Evaluation
Logical Form Evaluation
DRS Evaluation
Conclusion and Further Work
References
Entailment
Non Compositional Semantics Using Rewriting
Introduction
Semantic Role Labelling
Building and Using Semantic Representations
From Labelled Dependency Structures to FOL Formulae
Illustrating Example
Checking Entailment
Evaluation
Conclusion
References
Defining Specialized Entailment Engines Using Natural Logic Relations
Introduction
Extended Model of Natural Logic
Transformation-Based TE and Specialized Entailment Engines
Entailment Rules and Atomic Edits
Combination Based on Natural Logics
Order of Composition
Example of Application of the Proposed Framework to RTE Pairs
Conclusions
References
Dialogue Modeling and Processing
Czech Senior COMPANION: Wizard of Oz Data Collection and Expressive Speech Corpus Recording and Annotation
Introduction
Data Collection Process
Dialogue Corpus Characteristics
Design and Recording of the Expressive Speech Corpus
Annotation Using Communicative Functions
Conclusions and Future Work
References
Abstractive Summarization of Voice Communications
Introduction
Related Work
Paper Outline
Automatic Argumentative Analysis
Argumentative Structure - Issues and Theories
Computing Argumentative Annotations
The A3 Algorithm
Experimental Results
Abstract Summarization of Conversations
Conclusions
Future Work
References
Natural Language Based Communication between Human Users and the Emergency Center: POLINT-112-SMS
Credits
Introduction
Challenging Aspects of the Project
User Modeling
Linguistic Challenge of SMS Processing
Knowledge Representation and Reasoning
System Development Methodology
Elements of the Logical/Physical Model. System Architecture
Language Coverage Related Issues
Project Resources
PolNet
Verbo-Nominal Collocations Dictionary
Concluding Remarks
References
Dialogue Organization in Polint-112-SMS
Introduction
System Architecture
Dialogue Organization
The "Philosophy" of Dialogue in the System
Responsibilities of the Dialogue Maintenance Module
Dialogue-Oriented Features of the Situation Analysis Module
Evaluation
User Surveys
Known Problems
References
Digital Language Resources
Valuable Language Resources and Applications Supporting the Use of Basque
Introduction
Strategy to Develop HLT in Basque
Useful Applications and Resources
Spelling Checker/Corrector
Lemmatization-Based On-Line Dictionaries
Lemmatization-Based Search Machine
Transfer-Based Machine Translation System
EDBL: Lexical Database for Basque
BasWN: Basque WordNet
EPEC: Syntactically Annotated Text Corpus
ZTC: Morphosyntactically Annotated Text Corpus
Conclusions
References
Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons
Introduction
State of the Art
Description of Resources
Experiment and Analysis of the Results
Corpus-Based Study
Using Morphosyntactic Lexicons
Conclusions and Further Work
References
Corpus Clouds - Facilitating Text Analysis by Means of Visualizations
Introduction
Corpus Clouds
Corpus Query Tool
Corpus Inquiry Tasks and the Aim of Corpus Clouds
Design Overview
Challenges with the Visualizations
Some Design Principles
Evaluation and Future Work
Conclusion
References
Acquiring Bilingual Lexica from Keyword Listings
Introduction
Collecting the Corpus
Procedure for the Keywords Extraction
Scan and Split
Alignment
Evaluation
Alignment with GIZA as a Baseline
Recall from Documents
Precision
Conclusions
References
Annotating Sanskrit Corpus: Adapting IL-POSTS
Introduction
POS Tagging in Sanskrit
Sanskrit Morphology
MSRI Hierarchical Tagset Schema
Adaptations for Sanskrit
Proposed IL-POSTS for Sanskrit
POS Results and Current Status
Conclusion
References
Effective Authoring Procedure for E-learning Courses' Development in Philological Curriculum Based on LOs Ideology
Introduction
Theoretical Background
Authoring Practice and Guiding Principles
Course Prototype Structures
Course Design Requirements
Results
References
Acquisition of Spatial Relations from an Experimental Corpus
Introduction
Description of the Problem
Experiment
Results
Type 1
Type 2
Results and the Participants' Profiles
Conclusion
References
Which XML Standards for Multilevel Corpus Annotation?
Introduction
Requirements
Standards and Best Practices
ISO TC37 / SC4
TEI
XCES
TIGER-XML
PAULA
Discussion
Standards in NKJP
Metadata, Primary Data and Structure
Segmentation
Morphosyntax
Syntactic Words
Named Entities and Syntactic Groups
Word Senses
Conclusion
References
Corpus Academicum Lithuanicum: Design Criteria, Methodology, Application
Introduction
The Use of Language Corpora
Digitalised Resources of the Lithuanian Language
The Building of Corpus Academicum Lithuanicum
Corpus Design
Representativeness
Encoding of Textual Data
Automatic Encoding
Perspectives
References
WordNet
The EM-Based Wordnet Synsets Annotation of NP/PP Heads
Introduction
Data Resources
Semantic Annotation
The EM Selection Algorithm
Related Works
The Experiment
Manually Annotated Data for an Evaluation of the Algorithm
Efficiency of the Algorithm
Evaluation of the Algorithm
Conclusions
References
Unsupervised Word Sense Disambiguation with Lexical Chains and Graph-Based Context Formalization
Introduction
Lexical Chains
The WSD Algorithm
Evaluations
Conclusions
References
An Access Layer to PolNet - Polish WordNet
Introduction
Access Layer Architecture
WQuery Language
Data Types
Basic Syntax
Typical Queries Appearing in POLINT-112-SMS
Obtaining Word Meanings
Creating and Composing Frames
Refreshing PolNet Cache
Discussion
Conclusion
References
Document Processing
OTTO: A Tool for Diplomatic Transcription of Historical Texts
Introduction
Requirements of Transcription Tools
Characteristics of Historical Texts
Meta-information: Header and Comments
Requirements of Transcription Tools
Related Work
OTTO
Conclusion and FutureWork
References
Automatic Author Attribution for Short Text Documents
Introduction
Authorship Analysis Classification
Features
Algorithm
Corpus
Experiments
Conclusions
References
BioExcom: Detection and Categorization of Speculative Sentences in Biomedical Literature
Introduction
Task
Goal
Definition of Biological Speculation in Articles
Importance of Speculative Sentences in Biological Literature
Categorization into Prior and New Speculation
Automatic Annotation of Speculative Sentences by Contextual Exploration Processing
The Contextual Exploration Processing
Computational Architecture of the CE Engine and Overview of Text Treatment
The Linguistic Markers of Speculation in Biological Sentences
Categorization of Speculative Sentences
BioExcom Implementation
Evaluation
Evaluation Methodology
Results of the Evaluation
Perspectives
References
Experimenting with Automatic Text Summarisation for Arabic
Introduction
Related Work
Summarisers for Arabic: AQBTSS and ACBTSS
ACBTSS Concepts
Experimental Design
The Document Collection
The Evaluation Scale
The Subjects
Additional Experiments with Sakhr
Results
AQBTSS versus ACBTSS
Sakhr Summarisation System
Discussion of Results
Conclusions and Future Work
References
Enhancing Opinion Extraction by Automatically Annotated Lexical Resources
Introduction
Related Work
The Opinion Extraction System
The Learning and Classification System
Lexical Resources for OM
Experiments
The WWC Opinion Markup Language
The Datasets: MPQA and I-CAB Opinion
Evaluation Models and Measures
The Experiments
Results and Conclusions
The Results
Statistical Significance Tests
Inter-Annotator Agreement
Conclusions
References
Technical Trend Analysis by Analyzing Research Papers' Titles
Introduction
System Behavior
Related Work
Utilization of Research Papers' Structures
Automatic Generation of Survey Articles and Technical Trend Maps
Analysis of Research Papers' Titles
Analyzing the Structure of Japanese Titles
Analyzing the Structure of English Titles
Experiments
Experimental Method
Experimental Results
Discussion
Conclusion
References
Information Processing (IR, IE, other)
Extracting and Visualizing Quotations from News Wires
Introduction
Related Work
Overall Architecture
Pre-processing with SXPipe
Named Entities Recognition
Verbatims Extraction
Parsing and Post-processing
Anaphora Resolution
Quotation Extraction
Web Interface for Visualization
Conclusions and Perspectives
References
Using Wikipedia to Improve Precision of Contextual Advertising
Introduction
Problem Statement
Contributions
Organizations
Related Work
Keyword Matching
Semantic Advertising
Keyword Extraction
Wikipedia Matching
Finding Similar Articles
Dimension Reduction
Combining Ranking Functions
Experiments
Data and Methodology
Average Precision
Results for the Ambiguous Dataset
Performance Gain and t-Interval
Conclusion
References
Unsupervised Extraction of Keywords from News Archives
Introduction
Related Work
Belga News Agency Archive
Automatic Extraction of Keywords
TextRank
Chi-Square Test
Information Radius
Raw Frequency
Evaluation
Conclusions
References
Machine Translation
Automatic Evaluation of Texts by Using Paraphrases
Introduction
Related Work
Automatic Evaluation of Texts
Text Evaluation Using Paraphrases
The Benefits of Paraphrases in Text Evaluation
Data
Paraphrases in Text Evaluation
An Automatic Method of Text Evaluation Using Paraphrases
Procedure for Text Evaluation
Paraphrase Methods
Experiments
Experimental Settings
Experimental Results
Discussion
Conclusions
References
Packing It All Up in Search for a Language Independent MT Quality Measure Tool - Part Two
Introduction
Research Setting
Results
Original Results with Complearn
Results with WMT08 and WMT10 Data
Discussion and Conclusions
References
Author Index

Systemvoraussetzungen

Als PDF speichern Als Link merken