
Human Language Technology. Challenges for Computer Science and Linguistics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Person
Content
- Title Page
- Preface
- Organization
- Table of Contents
- Speech Processing
- Data-Driven Approaches to Objective Evaluation of Phoneme Alignment Systems
- Introduction
- Experiments
- Data
- Front-End Processing
- HMM Training
- Differences between Systems
- Non-parametric Ranking of Variances
- Parametric Bayesian Models
- Results
- Non-parametric Ranking Results
- Parametric BayesianResults
- Discussion
- Conclusions
- References
- Phonetically Transcribed Speech Corpus Designed for Context Based European Portuguese TTS
- Introduction
- Methodology
- Description
- The Speech Corpus by Graphemes
- Phonetic Transcription by Rules
- Phonetic Transcription with Vocalic Reduction
- The Vocalic Reduction Influence
- Conclusions
- References
- Robust Speech Recognition in the Car Environment
- Introduction
- Background
- Spectral Subtraction
- Weighted Finite State Transducers in Speech Recognition
- Adaptation of the Acoustic Model HMM
- Experimental Conditions
- Drivers Japanese Speech Corpus in Car Environment
- WFST Network Construction
- Evaluation of the Baseline Model
- Evaluation of Nonlinear Spectral Subtraction
- Speaker Adaptation of Acoustic Model
- Conclusions
- References
- Corpus Design for a Unit Selection TtS System with Application to Bulgarian
- Introduction
- Unit Selection Text to Speech
- Unit Selection Module
- Spoken Corpus for TTS
- Corpus Design Strategies
- Utterance Selection Methods
- The Proposed Corpus Selection Method
- Results
- Experimental Evaluation
- Discussion
- References
- Automatic Identification of Phonetic Similarity Based on Underspecification
- Introduction
- Speech Recognition System and Corpus
- Experiment
- Results
- General Information
- Analysis of Underspecified System
- Phone [f]
- Common Phonetic Properties of [f] and [th]
- Phonetic Neighbourhood
- Frequency of Occurrence
- Discussion
- Conclusion
- References
- Error Detection in Broadcast News ASR Using Markov Chains
- Introduction
- Features for Error Detection
- Models for Error Detection
- Maximum Entropy Models
- Markov Chains
- Gaussian Mixture Models
- Corpus
- Experiments
- Evaluation
- Automatic Transcription
- Error Detection Results
- Result Analysis
- Impact of the Transition Probability Matrix
- Summary and Future Work
- References
- Pronunciation and Writing Variants in an Under-Resourced Language: The Case of Luxembourgish Mobile N-Deletion
- Introduction
- The Study of Written and Spoken Variants
- Text Normalization for Written Variants
- Pronunciation Modeling of Spoken Variants
- Effects of Luxembourgish MND on Written and Pronunciation Variants
- The Current Study
- Data Collection
- Characterizing Potential Mobile -N Sites
- MND in Transcriptions
- MND and Word List Coverage
- Summary and Prospects
- References
- Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition
- Introduction
- Language Modeling
- Factored Language Modeling
- The Morphology of Amharic
- The Baseline Speech Recognition System
- Speech and Text Corpus
- The Acoustic and Language Models
- Performance of the Baseline System
- Morpheme-Based and Factored Language Models
- Morpheme-Based Language Models
- Amharic Factored Language Models
- Lattice Rescoring Experiment
- Lattice Rescoring with Morpheme-Based Language Models
- Lattice Rescoring with Factored Language Models
- Conclusion
- References
- The Corpus Analysis Toolkit - Analysing Multilevel Annotations
- Introduction
- Linguistic Annotation
- Multilevel Annotations
- Tiers, Intervals, and Points
- Inter-tier Analysis
- Inter-annotation Analysis
- The Corpus Analysis Toolkit
- An Integrated Analytical Framework
- Supported Formats
- Internal Representation
- Toolkit Inventory
- Analysing a Corpus
- Corpus Overview
- Acquiring General Corpus Information
- Interval-Based Corpus Analysis
- Investigation of Temporal Inclusion
- Extracting Multilevel Representations
- The Toolkit and Emerging Standards
- Future Work
- Concluding Remarks
- References
- Computational Morphology/Lexicography
- Time Durations of Phonemes in Polish Language for Speech and Speaker Recognition
- Introduction
- Phoneme Segmentation
- Experimental Data
- Statistics Collection
- Results
- Conclusions
- References
- Polysemous Verb Classification Using Subcategorization Acquisition and Graph-Based Clustering
- Introduction
- Related Work
- Japanese Verb Description
- Selectional Preferences
- Link Analysis
- Distributional Similarity
- Clustering Method
- Experiments
- Data and Evaluation
- Results
- Conclusion
- References
- Estimating the Proximity between Languages by Their Commonality in Vocabulary Structures
- Introduction
- Basics of the Comparative Method
- Linguistic Specifications
- Formal Specifications
- Recent Works on Vocabulary Structure
- Analogy in Morphology
- A Measure of Similarity between Vocabulary Structures
- Experiments and Results
- Languages and Purpose of the Experiments
- Experiments with Swadesh Lists
- Experiments with a Multilingual Lexicon Extracted from the Acquis Communautaire
- Conclusion
- References
- Toposlaw - A Lexicographic Framework for Multi-word Units
- Introduction
- Objects in $Toposlaw$
- Morphology and Variants of a Name
- Morphological Description of Components
- Inflection Graphs
- Graph Management
- Filtering Graphs
- Tracing Paths in a Graph
- New Graphs
- Dictionary Management
- Conclusions and Perspectives
- References
- Parsing
- Parsing CFGs and PCFGs with a Chomsky-Sch¨utzenberger Representation
- Introduction
- Preliminaries
- Notation
- The Chomsky-Schützenberger Theorem
- An Encoding for CFGs
- Parsing with C-S Representations
- The Algorithm
- Analysis
- Weights and PCFGs
- An Example
- Conclusion and Further Work
- References
- Syntactic Analysis Using Finite Patterns: A New Parsing System for Czech
- Introduction
- Current Approaches to Czech Language Parsing
- SET - Syntactic Analysis as Pattern Matching Linking Rules
- The Parsing Algorithm
- The Pattern Definitions
- The SET Parser
- Experiments and Preliminary Results
- Testing Data
- Conclusions
- References
- Using SRX Standard for Sentence Segmentation
- Introduction
- SRX Standard
- Disambiguation Strategies
- Results for English and Polish
- Conclusions
- References
- Using Lexicon-Grammar Tables for French Verbs in a Large-Coverage Parser
- Introduction
- The Verbal Lexicon lglex
- The Le$fff$ Syntactic Lexicon and the Alexina Format
- Conversion of the Verbal Lexicon $lglex$ into a Lexicon in the Alexina Format
- Sketch of the Conversion Process
- Resulting Lexicon
- Integration in the frmg Parser
- Evaluation and Discussion
- Conclusion and Future Work
- References
- Computational Semantics
- Effect of Overt Pronoun Resolution in Topic Tracking
- Introduction
- Related Work
- Pronoun Resolution
- Pre-processing
- Identification of Antecedent
- Tracking Based on Term Weighting and Adaptation
- Experiments
- Anaphora Resolution
- Topic Tracking
- Conclusion
- References
- Sentiment Intensity: Is It a Good Summary Indicator?
- Introduction
- Related Work
- Sentiment Intensity and Summarisation
- Sentiment Annotated Corpus
- Sentiment Analysis
- Summarisation Algorithm Based on Sentiment Intensity
- Hypothesis Test for Sentiment Intensity Usefulness
- Experimental Results
- Sentiment Analysis
- Summarisation
- Conclusions
- References
- Syntactic Tree Kernels for Event-Time Temporal Relation Learning
- Introduction
- Previous Works
- Pattern Based Methods
- Rule Based Methods
- Anchor Based Methods
- Syntactic Tree Kernels in SVM
- Simple Event-Time Kernel
- Tree Kernels
- Composite Kernels
- Corpus Description
- Experiments
- Conclusion
- References
- The WSD Development Environment
- Introduction
- WSD Development Environment
- Corpora
- Feature Generation
- Feature Selection
- Machine Learning Algorithms
- Runtime
- Reports
- Experiments
- Conclusions
- References
- Semantic Analyzer in the Thetos-3 System
- Semantic Processing - General Premises
- Predicate-Argument Structure
- Semantic Interpretation of SGs
- Semantic Relations
- Facial Expressions and Gestures
- Requests
- Conclusion
- References
- Unsupervised and Open Ontology-Based Semantic Analysis
- Introduction
- Motivation, Theory and Practice
- The a-Grammar, a Pattern-Based Grammar
- A Tree-Like Representation Conversion
- A Compositional Analysis
- Examples
- Ontology-Based Semantic Analysis
- Evaluation
- Logical Form Evaluation
- DRS Evaluation
- Conclusion and Further Work
- References
- Entailment
- Non Compositional Semantics Using Rewriting
- Introduction
- Semantic Role Labelling
- Building and Using Semantic Representations
- From Labelled Dependency Structures to FOL Formulae
- Illustrating Example
- Checking Entailment
- Evaluation
- Conclusion
- References
- Defining Specialized Entailment Engines Using Natural Logic Relations
- Introduction
- Extended Model of Natural Logic
- Transformation-Based TE and Specialized Entailment Engines
- Entailment Rules and Atomic Edits
- Combination Based on Natural Logics
- Order of Composition
- Example of Application of the Proposed Framework to RTE Pairs
- Conclusions
- References
- Dialogue Modeling and Processing
- Czech Senior COMPANION: Wizard of Oz Data Collection and Expressive Speech Corpus Recording and Annotation
- Introduction
- Data Collection Process
- Dialogue Corpus Characteristics
- Design and Recording of the Expressive Speech Corpus
- Annotation Using Communicative Functions
- Conclusions and Future Work
- References
- Abstractive Summarization of Voice Communications
- Introduction
- Related Work
- Paper Outline
- Automatic Argumentative Analysis
- Argumentative Structure - Issues and Theories
- Computing Argumentative Annotations
- The A3 Algorithm
- Experimental Results
- Abstract Summarization of Conversations
- Conclusions
- Future Work
- References
- Natural Language Based Communication between Human Users and the Emergency Center: POLINT-112-SMS
- Credits
- Introduction
- Challenging Aspects of the Project
- User Modeling
- Linguistic Challenge of SMS Processing
- Knowledge Representation and Reasoning
- System Development Methodology
- Elements of the Logical/Physical Model. System Architecture
- Language Coverage Related Issues
- Project Resources
- PolNet
- Verbo-Nominal Collocations Dictionary
- Concluding Remarks
- References
- Dialogue Organization in Polint-112-SMS
- Introduction
- System Architecture
- Dialogue Organization
- The "Philosophy" of Dialogue in the System
- Responsibilities of the Dialogue Maintenance Module
- Dialogue-Oriented Features of the Situation Analysis Module
- Evaluation
- User Surveys
- Known Problems
- References
- Digital Language Resources
- Valuable Language Resources and Applications Supporting the Use of Basque
- Introduction
- Strategy to Develop HLT in Basque
- Useful Applications and Resources
- Spelling Checker/Corrector
- Lemmatization-Based On-Line Dictionaries
- Lemmatization-Based Search Machine
- Transfer-Based Machine Translation System
- EDBL: Lexical Database for Basque
- BasWN: Basque WordNet
- EPEC: Syntactically Annotated Text Corpus
- ZTC: Morphosyntactically Annotated Text Corpus
- Conclusions
- References
- Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons
- Introduction
- State of the Art
- Description of Resources
- Experiment and Analysis of the Results
- Corpus-Based Study
- Using Morphosyntactic Lexicons
- Conclusions and Further Work
- References
- Corpus Clouds - Facilitating Text Analysis by Means of Visualizations
- Introduction
- Corpus Clouds
- Corpus Query Tool
- Corpus Inquiry Tasks and the Aim of Corpus Clouds
- Design Overview
- Challenges with the Visualizations
- Some Design Principles
- Evaluation and Future Work
- Conclusion
- References
- Acquiring Bilingual Lexica from Keyword Listings
- Introduction
- Collecting the Corpus
- Procedure for the Keywords Extraction
- Scan and Split
- Alignment
- Evaluation
- Alignment with GIZA as a Baseline
- Recall from Documents
- Precision
- Conclusions
- References
- Annotating Sanskrit Corpus: Adapting IL-POSTS
- Introduction
- POS Tagging in Sanskrit
- Sanskrit Morphology
- MSRI Hierarchical Tagset Schema
- Adaptations for Sanskrit
- Proposed IL-POSTS for Sanskrit
- POS Results and Current Status
- Conclusion
- References
- Effective Authoring Procedure for E-learning Courses' Development in Philological Curriculum Based on LOs Ideology
- Introduction
- Theoretical Background
- Authoring Practice and Guiding Principles
- Course Prototype Structures
- Course Design Requirements
- Results
- References
- Acquisition of Spatial Relations from an Experimental Corpus
- Introduction
- Description of the Problem
- Experiment
- Results
- Type 1
- Type 2
- Results and the Participants' Profiles
- Conclusion
- References
- Which XML Standards for Multilevel Corpus Annotation?
- Introduction
- Requirements
- Standards and Best Practices
- ISO TC37 / SC4
- TEI
- XCES
- TIGER-XML
- PAULA
- Discussion
- Standards in NKJP
- Metadata, Primary Data and Structure
- Segmentation
- Morphosyntax
- Syntactic Words
- Named Entities and Syntactic Groups
- Word Senses
- Conclusion
- References
- Corpus Academicum Lithuanicum: Design Criteria, Methodology, Application
- Introduction
- The Use of Language Corpora
- Digitalised Resources of the Lithuanian Language
- The Building of Corpus Academicum Lithuanicum
- Corpus Design
- Representativeness
- Encoding of Textual Data
- Automatic Encoding
- Perspectives
- References
- WordNet
- The EM-Based Wordnet Synsets Annotation of NP/PP Heads
- Introduction
- Data Resources
- Semantic Annotation
- The EM Selection Algorithm
- Related Works
- The Experiment
- Manually Annotated Data for an Evaluation of the Algorithm
- Efficiency of the Algorithm
- Evaluation of the Algorithm
- Conclusions
- References
- Unsupervised Word Sense Disambiguation with Lexical Chains and Graph-Based Context Formalization
- Introduction
- Lexical Chains
- The WSD Algorithm
- Evaluations
- Conclusions
- References
- An Access Layer to PolNet - Polish WordNet
- Introduction
- Access Layer Architecture
- WQuery Language
- Data Types
- Basic Syntax
- Typical Queries Appearing in POLINT-112-SMS
- Obtaining Word Meanings
- Creating and Composing Frames
- Refreshing PolNet Cache
- Discussion
- Conclusion
- References
- Document Processing
- OTTO: A Tool for Diplomatic Transcription of Historical Texts
- Introduction
- Requirements of Transcription Tools
- Characteristics of Historical Texts
- Meta-information: Header and Comments
- Requirements of Transcription Tools
- Related Work
- OTTO
- Conclusion and FutureWork
- References
- Automatic Author Attribution for Short Text Documents
- Introduction
- Authorship Analysis Classification
- Features
- Algorithm
- Corpus
- Experiments
- Conclusions
- References
- BioExcom: Detection and Categorization of Speculative Sentences in Biomedical Literature
- Introduction
- Task
- Goal
- Definition of Biological Speculation in Articles
- Importance of Speculative Sentences in Biological Literature
- Categorization into Prior and New Speculation
- Automatic Annotation of Speculative Sentences by Contextual Exploration Processing
- The Contextual Exploration Processing
- Computational Architecture of the CE Engine and Overview of Text Treatment
- The Linguistic Markers of Speculation in Biological Sentences
- Categorization of Speculative Sentences
- BioExcom Implementation
- Evaluation
- Evaluation Methodology
- Results of the Evaluation
- Perspectives
- References
- Experimenting with Automatic Text Summarisation for Arabic
- Introduction
- Related Work
- Summarisers for Arabic: AQBTSS and ACBTSS
- ACBTSS Concepts
- Experimental Design
- The Document Collection
- The Evaluation Scale
- The Subjects
- Additional Experiments with Sakhr
- Results
- AQBTSS versus ACBTSS
- Sakhr Summarisation System
- Discussion of Results
- Conclusions and Future Work
- References
- Enhancing Opinion Extraction by Automatically Annotated Lexical Resources
- Introduction
- Related Work
- The Opinion Extraction System
- The Learning and Classification System
- Lexical Resources for OM
- Experiments
- The WWC Opinion Markup Language
- The Datasets: MPQA and I-CAB Opinion
- Evaluation Models and Measures
- The Experiments
- Results and Conclusions
- The Results
- Statistical Significance Tests
- Inter-Annotator Agreement
- Conclusions
- References
- Technical Trend Analysis by Analyzing Research Papers' Titles
- Introduction
- System Behavior
- Related Work
- Utilization of Research Papers' Structures
- Automatic Generation of Survey Articles and Technical Trend Maps
- Analysis of Research Papers' Titles
- Analyzing the Structure of Japanese Titles
- Analyzing the Structure of English Titles
- Experiments
- Experimental Method
- Experimental Results
- Discussion
- Conclusion
- References
- Information Processing (IR, IE, other)
- Extracting and Visualizing Quotations from News Wires
- Introduction
- Related Work
- Overall Architecture
- Pre-processing with SXPipe
- Named Entities Recognition
- Verbatims Extraction
- Parsing and Post-processing
- Anaphora Resolution
- Quotation Extraction
- Web Interface for Visualization
- Conclusions and Perspectives
- References
- Using Wikipedia to Improve Precision of Contextual Advertising
- Introduction
- Problem Statement
- Contributions
- Organizations
- Related Work
- Keyword Matching
- Semantic Advertising
- Keyword Extraction
- Wikipedia Matching
- Finding Similar Articles
- Dimension Reduction
- Combining Ranking Functions
- Experiments
- Data and Methodology
- Average Precision
- Results for the Ambiguous Dataset
- Performance Gain and t-Interval
- Conclusion
- References
- Unsupervised Extraction of Keywords from News Archives
- Introduction
- Related Work
- Belga News Agency Archive
- Automatic Extraction of Keywords
- TextRank
- Chi-Square Test
- Information Radius
- Raw Frequency
- Evaluation
- Conclusions
- References
- Machine Translation
- Automatic Evaluation of Texts by Using Paraphrases
- Introduction
- Related Work
- Automatic Evaluation of Texts
- Text Evaluation Using Paraphrases
- The Benefits of Paraphrases in Text Evaluation
- Data
- Paraphrases in Text Evaluation
- An Automatic Method of Text Evaluation Using Paraphrases
- Procedure for Text Evaluation
- Paraphrase Methods
- Experiments
- Experimental Settings
- Experimental Results
- Discussion
- Conclusions
- References
- Packing It All Up in Search for a Language Independent MT Quality Measure Tool - Part Two
- Introduction
- Research Setting
- Results
- Original Results with Complearn
- Results with WMT08 and WMT10 Data
- Discussion and Conclusions
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.