Speech and Computer

Name: Speech and Computer | 17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings

Andrey Ronzhin Rodmonga Potapova Nikos Fakotakis(Editor)

Springer (Publisher)

Published on 3. September 2015

XVI, 506 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-23132-7 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
About Athens
Contents
Invited Talks
Multimodal Human-Robot Interaction from the Perspective of a Speech Scientist
1 Introduction
2 Major Differences Between HCI and HRI
3 Different Types of Robots and Resulting Implications for Interaction Schemes
4 Basic Interaction Schemes in Human-Robot-Communication
5 Conclusions
References
A Decade of Discriminative Language Modeling for Automatic Speech Recognition
1 Introduction
2 Features
2.1 Linguistic Features
2.2 Statistically Derived Features
2.3 Acoustic Features
3 Algorithms
4 Training Approaches
4.1 Supervised Training
4.2 Semi-supervised Training
4.3 Unsupervised Training
4.4 Summary of Experiments on Training Approaches
5 Conclusion
References
Conference Papers
A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis
1 Introduction
2 The Kazakh Language
3 Speech Synthesis and Transcription for Kazakh
3.1 Dictionary and POS Tagging
3.2 Building Transcription Rules and Synthesizing Speech
4 Automatic Speech Recognition for Kazakh
4.1 The Speech Database
4.2 Acoustic Models
4.3 Experiments
5 Conclusions and Future Work
References
A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering
1 Introduction
2 Experiments and Results
2.1 MA Directivity Patterns
2.2 Incoherent and Coherent Noise Reduction Level
2.3 Spectrograms of the Processed Speech Signal
2.4 Signal-to-Deviation Ratio
3 Conclusions
References
A Comparison of RNN LM and FLM for Russian Speech Recognition
1 Introduction
2 Related Works
3 Recurrent Neural Network Language Model Topology
4 Creation of Language Models for Russian ASR
4.1 Creation of the Baseline Language Models
4.2 Creation of Recurrent Neural Network Language Models
5 Experiments
5.1 Experimental Setup
5.2 Experiments on Rescoring N-Best Lists Using FLM
5.3 Experiments on Rescoring N-Best Lists Using RNN LM
6 Conclusion
References
A Frequency Domain Adaptive Decorrelating Algorithm for Speech Enhancement
1 Introduction
2 Mixing Model
3 Proposed Frequency Domain (FD-SAD) Algorithm
4 Simulations, Results, and Analysis
4.1 System Mismatch (SM) Evaluation
4.2 Segmental SNR (SegSNR) Evaluation
5 Conclusion
References
Acoustic Markers of Emotional State ``Aggression''
1 Introduction
2 Method and Procedure
3 Conclusion
3.1 Prospects of Investigation
References
Algorithms for Low Bit-Rate Coding with Adaptation to Statistical Characteristics of Speech Signal
1 Introduction
2 Related Works
2.1 Structural Scheme of the Hybrid MELP/CELP Coder
2.2 Experimental Study of the Developed Adaptive Hybrid MELP/CELP Coder
3 Conclusion
References
Analysing Human-Human Negotiations with the Aim to Develop a Dialogue System
1 Introduction
2 Empirical Material and Used Software
3 Analysis of Human-Human Dialogues: Argument-Based Negotiation
3.1 Arguments and Negotiation in Telemarketing Calls
3.2 Negotiation in Travel Dialogues
3.3 Arguments and Negotiation in Everyday Dialogues
4 Discussion
5 Conclusion
References
Analysis of Facial Motion Capture Data for Visual Speech Synthesis
1 Introduction
2 Speech Data and Collection
3 Methods
3.1 Interpretation of Speech Data by Animation Model
3.2 Approximation of Speech Data
4 Evaluation
4.1 Objective Evaluation
4.2 Verification by Animation Model
5 Conclusions
References
Auditory-Perceptual Recognition of the Emotional State of Aggression
1 Introduction
2 Method, Procedure, and Results
3 Conclusions
4 Discussion
5 Prospects of Investigation
References
Automatic Classification and Prediction of Attitudes: Audio - Visual Analysis of Video Blogs
1 Introduction
2 Methodoloy
2.1 The Vlog Corpus
2.2 Attitude Annotation
2.3 Multimodal Feature Extraction
3 Attitude Classification Model
3.1 Feature Analysis
3.2 Prediction by Prosodic and Visual Features
4 Discussion
5 Conclusion
References
Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach
1 Introduction
2 Related Work
3 System Description
3.1 Training Data
3.2 Language Modeling
3.3 Acoustic Models
3.4 Test Data and Decoding
4 Results
4.1 Broadcast Conversation
4.2 Decoding with Advanced Language Models
5 Conclusions and Future Work
References
Automatic Estimation of Web Bloggers' Age Using Regression Models
1 Introduction
2 Background Work
3 Proposed Age Estimation of Web Bloggers Using Regression Models
4 Experimental Setup and Results
5 Conclusion
References
Automatic Preprocessing Technique for Detection of Corrupted Speech Signal Fragments for the Purpose of Speaker Recognition
1 Introduction
2 Preprocessing Technique
3 Preprocessing Technique
3.1 Click Detector
3.2 Overloading Detector
3.3 Clipping Detector
3.4 Tones Detector
3.5 Music Detector
3.6 Voice Activity Detector
4 Experimental and Results
5 Conclusions
References
Automatic Sound Recognition of Urban Environment Events
1 Introduction
2 System Description
3 Experimental Setup
3.1 Audio Data Description
3.2 Feature Extraction
3.3 Classification
4 Experimental Results
5 Conclusions
References
Automatically Trained TTS for Effective Attacks to Anti-spoofing System
1 Introduction
2 Anti-spoofing System
3 Spoofing Attack Modelling
4 Experiments
4.1 TTS Training Database
4.2 Evaluation Results
5 Conclusion
References
EmoChildRu: Emotional Child Russian Speech Corpus
1 Introduction
2 Emotional Child Russian Speech Corpus - EmoChildRu
2.1 Data Collection
2.2 Corpus and Software Structure
3 Data Analysis
4 Experimental Results
4.1 Human Recognition of Emotional States
4.2 Automatic Classification of Emotional States
5 Discussion
6 Conclusions
References
Cognitive Mechanism of Semantic Content Decoding of Spoken Discourse in Noise
1 Introduction
2 Method and Experiment
3 Discussion
3.1 MRA Text Assessment Method
4 Conclusion
References
Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System
1 Introduction
2 System Overview
2.1 The Lexical Classifier
2.2 The Prosodic Classifier
2.3 The Combined Model
2.4 Second Pass for Question Mark Detection
3 Experimental Setup
3.1 The Datasets
3.2 ASR Setup
4 Results and Discussion
5 Conclusions and Future Research
References
Construction of a Modern Greek Grammar Checker Through Mnemosyne Formalism
1 Introduction
2 Particularities of Modern Greek Language
3 Lexical Ambiguity in Modern Greek
4 Features of the Grammar Checker
5 Implementation of Software
6 ``Kanon'' Formalism
7 Evaluation
References
Contribution to the Design of an Expressive Speech Synthesis System for the Arabic Language
1 Introduction
2 System Description
2.1 Orthographic-to-Phonetic Transcription
2.2 Diphone Database
2.3 Diphone Concatenation
2.4 Voice Transformation
3 Experiments and Results
4 Conclusion and Future Works
References
Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit
1 Introduction
2 GMM-HMM Recipe
3 DNN Recipe
4 Data Preparation
5 Experimental Results
6 Conclusion
References
DNN-Based Speech Synthesis: Importance of Input Features and Training Data
1 Introduction
2 Framework
2.1 Database and Input/Output Features
2.2 DNN Setup
2.3 HMM Setup
2.4 Synthesis
3 Results
3.1 Objective Evaluation
3.2 Subjective Evaluation
4 Conclusions
References
Emotion State Manifestation in Voice Features: Chimpanzees, Human Infants, Children, Adults
1 Introduction
2 Method
3 Results
3.1 Experiment 1
3.2 Experiment 2
3.3 Experiment 3
4 Conclusion and Discussion
References
Estimation of Vowel Spectra Near Vocal Chords with Restoration of a Clipped Speech Signal
1 Problem Statement
2 Output Signals
3 Input Signals
4 Transfer Function of the Vocal Tract
5 Perception Tests
6 Algorithm for Restoration of Clipped Signals
7 Conclusion
References
Fast Algorithm for Precise Estimation of Fundamental Frequency on Short Time Intervals
1 Problem Statement
2 Model and Cost Function
3 Minimum of the Cost Function
4 The Basic Algebraic Transformations
5 Unbiased Criterion
6 Example
7 Evaluation of the Algorithm
8 Conclusion
References
Gender Classification of Web Authors Using Feature Selection and Language Models
1 Introduction
2 Proposed Gender Identification Methodology
2.1 Feature Extraction
2.2 Feature Selection
2.3 Classification
3 Experimental Setup and Evaluation
4 Conclusion
References
Improving Acoustic Models for Russian Spontaneous Speech Recognition
1 Introduction
2 Applying the SWB Recipe to Russian Data
3 Lowering Sensitivity to Acoustic Variability
4 Speaker-Dependent Bottleneck Features
5 Experiments
6 Conclusion
References
Information Sources of Word Semantics Methods
1 Introduction
2 The Sources of Information
2.1 Semantic Networks
2.2 Global Context
2.3 Local Context
3 Experiments
3.1 Evaluation Corpora
3.2 Training
3.3 Results
4 Discussion
5 Related Work
6 Conclusion
References
Invariant Components of Speech Signals: Analysis and Visualization
1 Introduction
2 Method and Experiment
2.1 Produced and Perceived Invariants
2.2 Diagnostics of Invariant Sound Elements
3 Conclusion
References
Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings ????? ? ?????? ???
1 Introduction
2 Language Model Adaptation to a Specific User
2.1 User-Specific Text Data
2.2 Language Model Adaptation
2.3 Adjusting Interpolation Weights
3 Speech Recognition Overview
4 Experiments
4.1 Language Model Adaptation to a Specific User
4.2 Combination of Hypotheses from Multiple Recognition Setups
5 Conclusion
References
Macro Episodes of Russian Everyday Oral Communication: Towards Pragmatic Annotation of the ORD Speech Corpus
1 The ORD Speech Corpus
2 Annotation of Communication Situations in the ORD Corpus
2.1 Setting/Scene of Communication
2.2 Speaker's Social Roles
2.3 General Types of Everyday Oral Communication and Other Circumstances
3 Distribution of Communication Situations in the ORD Corpus
4 Functional Activity of Words in Different Communication Situations
5 Conclusion
References
Missing Feature Kernel and Nonparametric Window Subband Power Distribution for Robust Sound Event Classification
1 Introduction
2 Non-parametric Windows SPD
2.1 Subband Power Distribution
2.2 Nonparametric Windows SPD
2.3 NW-SPD Missing Feature Mask
3 Missing Feature Kernel Classification
4 Experiments
5 Conclusions
References
Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech
1 Introduction
2 Material
3 Method for Automatic FPs Detection
4 Conclusions
References
Multimodal Presentation of Bulgarian Child Language
1 Introduction
2 Bulgarian Corpus of Child Speech Data
3 The Cooperation of TALKBANK from North America with CLARIN
4 Conclusion
References
On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
1 Introduction
2 Neural-Network-based Acoustic Models
3 Neural-Network-based Feature Extraction
4 Signal Processing Methods in NN-based Acoustic Models
5 Experiments and Results
6 Conclusion and Future Work
References
Opinion Recognition on Movie Reviews by Combining Classifiers
1 Introduction
2 Background
3 Proposed Method for Automatic Recognition of Opinion
4 Experimental Setup and Evaluation
5 Conclusions
References
Optimization of Pitch Tracking and Quantization
1 Introduction
2 A Pitch Determination Algorithm
2.1 Comparison of Noise Immunity of Different Pitch Determination Algorithms
2.2 Optimal Quantizer for Pitch
3 Conclusion
References
PLDA Speaker Verification with Limited Speech Data
1 Introduction
2 Speaker Verification Using PLDA
2.1 I-vector Extraction
2.2 PLDA Modeling
2.3 PLDA Scoring
3 Experimental Configuration
4 Experimental Results
4.1 Analysis of Performance on Data from Speakers Used in the Development
4.2 Analysis of Performance on Limited Speech Data
5 Conclusion
References
Real-Time Context Aware Audio Augmented Reality
1 Introduction
1.1 Related Works
1.2 Originality of This Work
2 Room's Dimensions and Large Object Clustering
3 Context-Aware Augmented Reality Audio Rendering
3.1 Attenuation and Relative Velocity - Doppler Effect
3.2 Dimensions and Position of Real Objects
3.3 Type of Environment
4 Experimental Procedure and Results
5 Conclusions
References
Recurrent Neural Networks for Hypotheses Re-Scoring
1 Introduction
2 Recurrent Neural Network Language Model
3 Recurrent Neural Networks and Inflective Languages
3.1 Experiments
3.2 Results
3.3 Discussion
References
Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication
1 Introduction
2 Previous Studies on Opus Codec
3 Opus Codec in a Web-Based Real-Time Communication
3.1 WebRTC Principles
3.2 Characteristics of Google Chrome Implementation
4 Results and Discussion
4.1 Runtime Functioning of Opus Codec
4.2 Manipulation of Codec Parameters
4.3 Audio Performance
5 Conclusion
References
Semantic Multilingual Differences of Terminological Definitions Regarding the Concept ``Artificial Intelligence''
1 Introduction
2 Method and Procedure
3 Results and Discussion
4 Conclusion and Prospects of Investigation
References
SNR Estimation Based on Adaptive Signal Decomposition for Quality Evaluation of Speech Enhancement Algorithms
1 Introduction
2 Problem Formulation
3 The Proposed Method
4 Experiments and Results
5 Conclusions
References
Sociolinguistic Factors in Text-Based Sentence Boundary Detection
1 Introduction
2 Data and Method Description
2.1 Corpus
2.2 Texts for Analysis
2.3 Expert Manual Annotation
2.4 Prosodic Annotation
3 Data Analysis
3.1 Difference Between Types of Text
3.2 BCS, Pause and Gender
3.3 BCS, Pause and Age
3.4 BCS, Pause and Profession
4 Discussion and Conclusions
References
Sparsity Analysis and Compensation for i-Vector Based Speaker Verification
1 Introduction
2 Total Factor Space and i-Vector
3 Phonetic Sparsity Analysis
3.1 Baum-Welch Statistics and Adapted Gaussian Mean Vectors
3.2 Deviation of First Order Baum-Welch Statistics on Sparse Training Data
4 Adapted First Order Baum-Welch Statistics Analysis
5 Experiments
6 Conclusion
References
Speaker Identification Using Semi-supervised Learning
1 Introduction
2 Speaker Identification Using Machine Learning
3 Semi-supervised Techniques
4 Proposed Algorithm
5 Experiments
6 Conclusion
References
Speaker Verification Using Spectral and Durational Segmental Characteristics
1 Introduction
2 Speaker Verification Methods
2.1 Formant Method
2.2 Phone Durations Method
2.3 Pitch Method
3 Experiments
3.1 Database
3.2 Experiment -- Speaker Verification
3.3 Experiment -- Informative Phones
4 Conclusion
References
Speech Enhancement in Quasi-Periodic Noises Using Improved Spectral Subtraction Based on Adaptive Sampling
1 Introduction
2 Method Outline
2.1 Acquisition of Rotation Rate
2.2 Time-Warping
2.3 Noise Reduction
3 Experimental Result
3.1 Fundamental Frequency Estimation Accuracy
3.2 Performance Analysis
4 Conclusion
References
Sub-word Language Modeling for Russian LVCSR
1 Introduction
2 Methodology
2.1 Sub-lexical Units
2.2 Text Data Collection and Normalization
2.3 Phonetic Transcription
3 Experimental Setup
4 Evaluation
5 Conclusions and Future Work
References
Temporal Organization of Phrase-final Words as a Function of Pitch Movement Type
1 Introduction
2 Material
3 Method
4 Stressed Vowels
4.1 IP-final Position
4.2 Utterance-final Position
5 Post--stressed Vowels
5.1 IP-final Position
5.2 Utterance-final Position
6 Final Consonants
7 Conclusions
References
The ``One Day of Speech'' Corpus: Phonetic and Syntactic Studies of Everyday Spoken Russian
1 Introduction: The ORD Corpus
2 Phonetic Studies
2.1 Temporal Studies
2.2 Study of Reduction. Phonetic Realization of Words and Affixes
2.3 Studying the?Weak Points?in Speech Perception and Production
2.4 Russian Speech Rhythm Studies
2.5 Hesitation Phenomena
3 Syntactic Studies
4 Some Directions for Further Research
References
The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
1 Introduction
2 Multi-level Method of Speech Material Annotation
2.1 General Requirements
2.2 Specific Marks
2.3 Software Tools
2.4 Example
3 Tag Sorting for ASR
4 Conclusion
References
The Role of Prosody in the Perception of Synthesized and Natural Speech
1 Introduction
2 Speech Comprehension: Problems
3 Methodology: Experiments and Results
3.1 Results
4 Discussion and Conclusions
References
The Singular Estimation Pitch Tracker
1 Introduction
2 Singular Estimation Fundamental Pitch Frequency
3 Experiment
4 Conclusion
References
Voice Conversion Between Synthesized Bilingual Voices Using Line Spectral Frequencies
1 Introduction
2 Formant Space
3 Frequency Warping
3.1 Weighted Frequency Interpolation
3.2 Line Spectral Frequency Warping
4 Experiments
4.1 Environments
4.2 Results and Discussion
5 Conclusion
References
Voicing-Based Classified Split Vector Quantizer for Efficient Coding of AMR-WB ISF Parameters
1 Introduction
2 Classified Split Vector Quantization
3 Efficient Coding of Wideband ISF Parameters
4 Conclusion
References
Vulnerability of Voice Verification System with STC Anti-spoofing Detector to Different Methods of Spoofing Attacks
1 Introduction
2 Voice Verification System with Anti-spoofing
2.1 Voice Verification Module
2.2 Spoofing Detection Module
2.3 Fusion Decision Module
3 Experiments with Different Types of Spoofing
4 Conclusions
References
WebTransc --- A WWW Interface for Speech Corpora Production and Processing
1 Introduction
2 WebTransc
3 Technical Details
3.1 Security
4 Data Preparation and Import
5 Summary
References
Word-External Reduction in Spontaneous Russian
1 Introduction
2 Material
3 Method
4 Analysis of External Reductions
4.1 `Word Contraction'
4.2 Absence of W1 Final Fragment (More Than One Sound)
4.3 Absence of External Sound of a Word
4.4 Sound Contraction
4.5 Three Word Contraction
5 Discussion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Speech and Computer

Description

More details

Other editions

Additional editions

Content

System requirements