
Speech and Computer
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- About Athens
- Contents
- Invited Talks
- Multimodal Human-Robot Interaction from the Perspective of a Speech Scientist
- 1 Introduction
- 2 Major Differences Between HCI and HRI
- 3 Different Types of Robots and Resulting Implications for Interaction Schemes
- 4 Basic Interaction Schemes in Human-Robot-Communication
- 5 Conclusions
- References
- A Decade of Discriminative Language Modeling for Automatic Speech Recognition
- 1 Introduction
- 2 Features
- 2.1 Linguistic Features
- 2.2 Statistically Derived Features
- 2.3 Acoustic Features
- 3 Algorithms
- 4 Training Approaches
- 4.1 Supervised Training
- 4.2 Semi-supervised Training
- 4.3 Unsupervised Training
- 4.4 Summary of Experiments on Training Approaches
- 5 Conclusion
- References
- Conference Papers
- A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis
- 1 Introduction
- 2 The Kazakh Language
- 3 Speech Synthesis and Transcription for Kazakh
- 3.1 Dictionary and POS Tagging
- 3.2 Building Transcription Rules and Synthesizing Speech
- 4 Automatic Speech Recognition for Kazakh
- 4.1 The Speech Database
- 4.2 Acoustic Models
- 4.3 Experiments
- 5 Conclusions and Future Work
- References
- A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering
- 1 Introduction
- 2 Experiments and Results
- 2.1 MA Directivity Patterns
- 2.2 Incoherent and Coherent Noise Reduction Level
- 2.3 Spectrograms of the Processed Speech Signal
- 2.4 Signal-to-Deviation Ratio
- 3 Conclusions
- References
- A Comparison of RNN LM and FLM for Russian Speech Recognition
- 1 Introduction
- 2 Related Works
- 3 Recurrent Neural Network Language Model Topology
- 4 Creation of Language Models for Russian ASR
- 4.1 Creation of the Baseline Language Models
- 4.2 Creation of Recurrent Neural Network Language Models
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Experiments on Rescoring N-Best Lists Using FLM
- 5.3 Experiments on Rescoring N-Best Lists Using RNN LM
- 6 Conclusion
- References
- A Frequency Domain Adaptive Decorrelating Algorithm for Speech Enhancement
- 1 Introduction
- 2 Mixing Model
- 3 Proposed Frequency Domain (FD-SAD) Algorithm
- 4 Simulations, Results, and Analysis
- 4.1 System Mismatch (SM) Evaluation
- 4.2 Segmental SNR (SegSNR) Evaluation
- 5 Conclusion
- References
- Acoustic Markers of Emotional State ``Aggression''
- 1 Introduction
- 2 Method and Procedure
- 3 Conclusion
- 3.1 Prospects of Investigation
- References
- Algorithms for Low Bit-Rate Coding with Adaptation to Statistical Characteristics of Speech Signal
- 1 Introduction
- 2 Related Works
- 2.1 Structural Scheme of the Hybrid MELP/CELP Coder
- 2.2 Experimental Study of the Developed Adaptive Hybrid MELP/CELP Coder
- 3 Conclusion
- References
- Analysing Human-Human Negotiations with the Aim to Develop a Dialogue System
- 1 Introduction
- 2 Empirical Material and Used Software
- 3 Analysis of Human-Human Dialogues: Argument-Based Negotiation
- 3.1 Arguments and Negotiation in Telemarketing Calls
- 3.2 Negotiation in Travel Dialogues
- 3.3 Arguments and Negotiation in Everyday Dialogues
- 4 Discussion
- 5 Conclusion
- References
- Analysis of Facial Motion Capture Data for Visual Speech Synthesis
- 1 Introduction
- 2 Speech Data and Collection
- 3 Methods
- 3.1 Interpretation of Speech Data by Animation Model
- 3.2 Approximation of Speech Data
- 4 Evaluation
- 4.1 Objective Evaluation
- 4.2 Verification by Animation Model
- 5 Conclusions
- References
- Auditory-Perceptual Recognition of the Emotional State of Aggression
- 1 Introduction
- 2 Method, Procedure, and Results
- 3 Conclusions
- 4 Discussion
- 5 Prospects of Investigation
- References
- Automatic Classification and Prediction of Attitudes: Audio - Visual Analysis of Video Blogs
- 1 Introduction
- 2 Methodoloy
- 2.1 The Vlog Corpus
- 2.2 Attitude Annotation
- 2.3 Multimodal Feature Extraction
- 3 Attitude Classification Model
- 3.1 Feature Analysis
- 3.2 Prediction by Prosodic and Visual Features
- 4 Discussion
- 5 Conclusion
- References
- Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach
- 1 Introduction
- 2 Related Work
- 3 System Description
- 3.1 Training Data
- 3.2 Language Modeling
- 3.3 Acoustic Models
- 3.4 Test Data and Decoding
- 4 Results
- 4.1 Broadcast Conversation
- 4.2 Decoding with Advanced Language Models
- 5 Conclusions and Future Work
- References
- Automatic Estimation of Web Bloggers' Age Using Regression Models
- 1 Introduction
- 2 Background Work
- 3 Proposed Age Estimation of Web Bloggers Using Regression Models
- 4 Experimental Setup and Results
- 5 Conclusion
- References
- Automatic Preprocessing Technique for Detection of Corrupted Speech Signal Fragments for the Purpose of Speaker Recognition
- 1 Introduction
- 2 Preprocessing Technique
- 3 Preprocessing Technique
- 3.1 Click Detector
- 3.2 Overloading Detector
- 3.3 Clipping Detector
- 3.4 Tones Detector
- 3.5 Music Detector
- 3.6 Voice Activity Detector
- 4 Experimental and Results
- 5 Conclusions
- References
- Automatic Sound Recognition of Urban Environment Events
- 1 Introduction
- 2 System Description
- 3 Experimental Setup
- 3.1 Audio Data Description
- 3.2 Feature Extraction
- 3.3 Classification
- 4 Experimental Results
- 5 Conclusions
- References
- Automatically Trained TTS for Effective Attacks to Anti-spoofing System
- 1 Introduction
- 2 Anti-spoofing System
- 3 Spoofing Attack Modelling
- 4 Experiments
- 4.1 TTS Training Database
- 4.2 Evaluation Results
- 5 Conclusion
- References
- EmoChildRu: Emotional Child Russian Speech Corpus
- 1 Introduction
- 2 Emotional Child Russian Speech Corpus - EmoChildRu
- 2.1 Data Collection
- 2.2 Corpus and Software Structure
- 3 Data Analysis
- 4 Experimental Results
- 4.1 Human Recognition of Emotional States
- 4.2 Automatic Classification of Emotional States
- 5 Discussion
- 6 Conclusions
- References
- Cognitive Mechanism of Semantic Content Decoding of Spoken Discourse in Noise
- 1 Introduction
- 2 Method and Experiment
- 3 Discussion
- 3.1 MRA Text Assessment Method
- 4 Conclusion
- References
- Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System
- 1 Introduction
- 2 System Overview
- 2.1 The Lexical Classifier
- 2.2 The Prosodic Classifier
- 2.3 The Combined Model
- 2.4 Second Pass for Question Mark Detection
- 3 Experimental Setup
- 3.1 The Datasets
- 3.2 ASR Setup
- 4 Results and Discussion
- 5 Conclusions and Future Research
- References
- Construction of a Modern Greek Grammar Checker Through Mnemosyne Formalism
- 1 Introduction
- 2 Particularities of Modern Greek Language
- 3 Lexical Ambiguity in Modern Greek
- 4 Features of the Grammar Checker
- 5 Implementation of Software
- 6 ``Kanon'' Formalism
- 7 Evaluation
- References
- Contribution to the Design of an Expressive Speech Synthesis System for the Arabic Language
- 1 Introduction
- 2 System Description
- 2.1 Orthographic-to-Phonetic Transcription
- 2.2 Diphone Database
- 2.3 Diphone Concatenation
- 2.4 Voice Transformation
- 3 Experiments and Results
- 4 Conclusion and Future Works
- References
- Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit
- 1 Introduction
- 2 GMM-HMM Recipe
- 3 DNN Recipe
- 4 Data Preparation
- 5 Experimental Results
- 6 Conclusion
- References
- DNN-Based Speech Synthesis: Importance of Input Features and Training Data
- 1 Introduction
- 2 Framework
- 2.1 Database and Input/Output Features
- 2.2 DNN Setup
- 2.3 HMM Setup
- 2.4 Synthesis
- 3 Results
- 3.1 Objective Evaluation
- 3.2 Subjective Evaluation
- 4 Conclusions
- References
- Emotion State Manifestation in Voice Features: Chimpanzees, Human Infants, Children, Adults
- 1 Introduction
- 2 Method
- 3 Results
- 3.1 Experiment 1
- 3.2 Experiment 2
- 3.3 Experiment 3
- 4 Conclusion and Discussion
- References
- Estimation of Vowel Spectra Near Vocal Chords with Restoration of a Clipped Speech Signal
- 1 Problem Statement
- 2 Output Signals
- 3 Input Signals
- 4 Transfer Function of the Vocal Tract
- 5 Perception Tests
- 6 Algorithm for Restoration of Clipped Signals
- 7 Conclusion
- References
- Fast Algorithm for Precise Estimation of Fundamental Frequency on Short Time Intervals
- 1 Problem Statement
- 2 Model and Cost Function
- 3 Minimum of the Cost Function
- 4 The Basic Algebraic Transformations
- 5 Unbiased Criterion
- 6 Example
- 7 Evaluation of the Algorithm
- 8 Conclusion
- References
- Gender Classification of Web Authors Using Feature Selection and Language Models
- 1 Introduction
- 2 Proposed Gender Identification Methodology
- 2.1 Feature Extraction
- 2.2 Feature Selection
- 2.3 Classification
- 3 Experimental Setup and Evaluation
- 4 Conclusion
- References
- Improving Acoustic Models for Russian Spontaneous Speech Recognition
- 1 Introduction
- 2 Applying the SWB Recipe to Russian Data
- 3 Lowering Sensitivity to Acoustic Variability
- 4 Speaker-Dependent Bottleneck Features
- 5 Experiments
- 6 Conclusion
- References
- Information Sources of Word Semantics Methods
- 1 Introduction
- 2 The Sources of Information
- 2.1 Semantic Networks
- 2.2 Global Context
- 2.3 Local Context
- 3 Experiments
- 3.1 Evaluation Corpora
- 3.2 Training
- 3.3 Results
- 4 Discussion
- 5 Related Work
- 6 Conclusion
- References
- Invariant Components of Speech Signals: Analysis and Visualization
- 1 Introduction
- 2 Method and Experiment
- 2.1 Produced and Perceived Invariants
- 2.2 Diagnostics of Invariant Sound Elements
- 3 Conclusion
- References
- Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings ????? ? ?????? ???
- 1 Introduction
- 2 Language Model Adaptation to a Specific User
- 2.1 User-Specific Text Data
- 2.2 Language Model Adaptation
- 2.3 Adjusting Interpolation Weights
- 3 Speech Recognition Overview
- 4 Experiments
- 4.1 Language Model Adaptation to a Specific User
- 4.2 Combination of Hypotheses from Multiple Recognition Setups
- 5 Conclusion
- References
- Macro Episodes of Russian Everyday Oral Communication: Towards Pragmatic Annotation of the ORD Speech Corpus
- 1 The ORD Speech Corpus
- 2 Annotation of Communication Situations in the ORD Corpus
- 2.1 Setting/Scene of Communication
- 2.2 Speaker's Social Roles
- 2.3 General Types of Everyday Oral Communication and Other Circumstances
- 3 Distribution of Communication Situations in the ORD Corpus
- 4 Functional Activity of Words in Different Communication Situations
- 5 Conclusion
- References
- Missing Feature Kernel and Nonparametric Window Subband Power Distribution for Robust Sound Event Classification
- 1 Introduction
- 2 Non-parametric Windows SPD
- 2.1 Subband Power Distribution
- 2.2 Nonparametric Windows SPD
- 2.3 NW-SPD Missing Feature Mask
- 3 Missing Feature Kernel Classification
- 4 Experiments
- 5 Conclusions
- References
- Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech
- 1 Introduction
- 2 Material
- 3 Method for Automatic FPs Detection
- 4 Conclusions
- References
- Multimodal Presentation of Bulgarian Child Language
- 1 Introduction
- 2 Bulgarian Corpus of Child Speech Data
- 3 The Cooperation of TALKBANK from North America with CLARIN
- 4 Conclusion
- References
- On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
- 1 Introduction
- 2 Neural-Network-based Acoustic Models
- 3 Neural-Network-based Feature Extraction
- 4 Signal Processing Methods in NN-based Acoustic Models
- 5 Experiments and Results
- 6 Conclusion and Future Work
- References
- Opinion Recognition on Movie Reviews by Combining Classifiers
- 1 Introduction
- 2 Background
- 3 Proposed Method for Automatic Recognition of Opinion
- 4 Experimental Setup and Evaluation
- 5 Conclusions
- References
- Optimization of Pitch Tracking and Quantization
- 1 Introduction
- 2 A Pitch Determination Algorithm
- 2.1 Comparison of Noise Immunity of Different Pitch Determination Algorithms
- 2.2 Optimal Quantizer for Pitch
- 3 Conclusion
- References
- PLDA Speaker Verification with Limited Speech Data
- 1 Introduction
- 2 Speaker Verification Using PLDA
- 2.1 I-vector Extraction
- 2.2 PLDA Modeling
- 2.3 PLDA Scoring
- 3 Experimental Configuration
- 4 Experimental Results
- 4.1 Analysis of Performance on Data from Speakers Used in the Development
- 4.2 Analysis of Performance on Limited Speech Data
- 5 Conclusion
- References
- Real-Time Context Aware Audio Augmented Reality
- 1 Introduction
- 1.1 Related Works
- 1.2 Originality of This Work
- 2 Room's Dimensions and Large Object Clustering
- 3 Context-Aware Augmented Reality Audio Rendering
- 3.1 Attenuation and Relative Velocity - Doppler Effect
- 3.2 Dimensions and Position of Real Objects
- 3.3 Type of Environment
- 4 Experimental Procedure and Results
- 5 Conclusions
- References
- Recurrent Neural Networks for Hypotheses Re-Scoring
- 1 Introduction
- 2 Recurrent Neural Network Language Model
- 3 Recurrent Neural Networks and Inflective Languages
- 3.1 Experiments
- 3.2 Results
- 3.3 Discussion
- References
- Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication
- 1 Introduction
- 2 Previous Studies on Opus Codec
- 3 Opus Codec in a Web-Based Real-Time Communication
- 3.1 WebRTC Principles
- 3.2 Characteristics of Google Chrome Implementation
- 4 Results and Discussion
- 4.1 Runtime Functioning of Opus Codec
- 4.2 Manipulation of Codec Parameters
- 4.3 Audio Performance
- 5 Conclusion
- References
- Semantic Multilingual Differences of Terminological Definitions Regarding the Concept ``Artificial Intelligence''
- 1 Introduction
- 2 Method and Procedure
- 3 Results and Discussion
- 4 Conclusion and Prospects of Investigation
- References
- SNR Estimation Based on Adaptive Signal Decomposition for Quality Evaluation of Speech Enhancement Algorithms
- 1 Introduction
- 2 Problem Formulation
- 3 The Proposed Method
- 4 Experiments and Results
- 5 Conclusions
- References
- Sociolinguistic Factors in Text-Based Sentence Boundary Detection
- 1 Introduction
- 2 Data and Method Description
- 2.1 Corpus
- 2.2 Texts for Analysis
- 2.3 Expert Manual Annotation
- 2.4 Prosodic Annotation
- 3 Data Analysis
- 3.1 Difference Between Types of Text
- 3.2 BCS, Pause and Gender
- 3.3 BCS, Pause and Age
- 3.4 BCS, Pause and Profession
- 4 Discussion and Conclusions
- References
- Sparsity Analysis and Compensation for i-Vector Based Speaker Verification
- 1 Introduction
- 2 Total Factor Space and i-Vector
- 3 Phonetic Sparsity Analysis
- 3.1 Baum-Welch Statistics and Adapted Gaussian Mean Vectors
- 3.2 Deviation of First Order Baum-Welch Statistics on Sparse Training Data
- 4 Adapted First Order Baum-Welch Statistics Analysis
- 5 Experiments
- 6 Conclusion
- References
- Speaker Identification Using Semi-supervised Learning
- 1 Introduction
- 2 Speaker Identification Using Machine Learning
- 3 Semi-supervised Techniques
- 4 Proposed Algorithm
- 5 Experiments
- 6 Conclusion
- References
- Speaker Verification Using Spectral and Durational Segmental Characteristics
- 1 Introduction
- 2 Speaker Verification Methods
- 2.1 Formant Method
- 2.2 Phone Durations Method
- 2.3 Pitch Method
- 3 Experiments
- 3.1 Database
- 3.2 Experiment -- Speaker Verification
- 3.3 Experiment -- Informative Phones
- 4 Conclusion
- References
- Speech Enhancement in Quasi-Periodic Noises Using Improved Spectral Subtraction Based on Adaptive Sampling
- 1 Introduction
- 2 Method Outline
- 2.1 Acquisition of Rotation Rate
- 2.2 Time-Warping
- 2.3 Noise Reduction
- 3 Experimental Result
- 3.1 Fundamental Frequency Estimation Accuracy
- 3.2 Performance Analysis
- 4 Conclusion
- References
- Sub-word Language Modeling for Russian LVCSR
- 1 Introduction
- 2 Methodology
- 2.1 Sub-lexical Units
- 2.2 Text Data Collection and Normalization
- 2.3 Phonetic Transcription
- 3 Experimental Setup
- 4 Evaluation
- 5 Conclusions and Future Work
- References
- Temporal Organization of Phrase-final Words as a Function of Pitch Movement Type
- 1 Introduction
- 2 Material
- 3 Method
- 4 Stressed Vowels
- 4.1 IP-final Position
- 4.2 Utterance-final Position
- 5 Post--stressed Vowels
- 5.1 IP-final Position
- 5.2 Utterance-final Position
- 6 Final Consonants
- 7 Conclusions
- References
- The ``One Day of Speech'' Corpus: Phonetic and Syntactic Studies of Everyday Spoken Russian
- 1 Introduction: The ORD Corpus
- 2 Phonetic Studies
- 2.1 Temporal Studies
- 2.2 Study of Reduction. Phonetic Realization of Words and Affixes
- 2.3 Studying the?Weak Points?in Speech Perception and Production
- 2.4 Russian Speech Rhythm Studies
- 2.5 Hesitation Phenomena
- 3 Syntactic Studies
- 4 Some Directions for Further Research
- References
- The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
- 1 Introduction
- 2 Multi-level Method of Speech Material Annotation
- 2.1 General Requirements
- 2.2 Specific Marks
- 2.3 Software Tools
- 2.4 Example
- 3 Tag Sorting for ASR
- 4 Conclusion
- References
- The Role of Prosody in the Perception of Synthesized and Natural Speech
- 1 Introduction
- 2 Speech Comprehension: Problems
- 3 Methodology: Experiments and Results
- 3.1 Results
- 4 Discussion and Conclusions
- References
- The Singular Estimation Pitch Tracker
- 1 Introduction
- 2 Singular Estimation Fundamental Pitch Frequency
- 3 Experiment
- 4 Conclusion
- References
- Voice Conversion Between Synthesized Bilingual Voices Using Line Spectral Frequencies
- 1 Introduction
- 2 Formant Space
- 3 Frequency Warping
- 3.1 Weighted Frequency Interpolation
- 3.2 Line Spectral Frequency Warping
- 4 Experiments
- 4.1 Environments
- 4.2 Results and Discussion
- 5 Conclusion
- References
- Voicing-Based Classified Split Vector Quantizer for Efficient Coding of AMR-WB ISF Parameters
- 1 Introduction
- 2 Classified Split Vector Quantization
- 3 Efficient Coding of Wideband ISF Parameters
- 4 Conclusion
- References
- Vulnerability of Voice Verification System with STC Anti-spoofing Detector to Different Methods of Spoofing Attacks
- 1 Introduction
- 2 Voice Verification System with Anti-spoofing
- 2.1 Voice Verification Module
- 2.2 Spoofing Detection Module
- 2.3 Fusion Decision Module
- 3 Experiments with Different Types of Spoofing
- 4 Conclusions
- References
- WebTransc --- A WWW Interface for Speech Corpora Production and Processing
- 1 Introduction
- 2 WebTransc
- 3 Technical Details
- 3.1 Security
- 4 Data Preparation and Import
- 5 Summary
- References
- Word-External Reduction in Spontaneous Russian
- 1 Introduction
- 2 Material
- 3 Method
- 4 Analysis of External Reductions
- 4.1 `Word Contraction'
- 4.2 Absence of W1 Final Fragment (More Than One Sound)
- 4.3 Absence of External Sound of a Word
- 4.4 Sound Contraction
- 4.5 Three Word Contraction
- 5 Discussion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.