Man-Machine Speech Communication

Name: Man-Machine Speech Communication | 20th National Conference, NCMMSC 2025, Zhenjiang, China, October 16-19, 2025, Proceedings
Brand: Springer
Price: 90.94 EUR
Availability: OnlineOnly

20th National Conference, NCMMSC 2025, Zhenjiang, China, October 16-19, 2025, Proceedings

Jia Jia Zhiyong Wu Lijian Gao Gongping Huang Ya Li(Editor)

Springer (Publisher)

Published on 1. January 2026

XIV, 539 pages

E-Book

PDF with digital watermarking

System requirements

978-981-95-5382-2 (ISBN)

€90.94incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

.- Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech
Recognition in Constrained Scenarios.

.- Multilevel and Granular L2 Pronunciation Assessment Using Stress-Based
Suprasegmental Features and Proficiency Adaptation.

.- CDMGTU-Net: A Causal Dual-Branch Multi-Channel Speech Enhancement Network
with Multi-Scale Gateted Feature Fusion.

.- A Two-Stage Band-Split Mamba-2 Network For Music Source Separation.

.- Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text.

.- MambaVoc: State Space Models for High-Fidelity Audio Synthesis.

.- StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.

.- Automatic Speech Evaluation Method Leveraging Deep Feature Fusion.

.- Curriculum Reinforcement Learning for Robust Low-Resource Chinese Dialect Speech Recognition.

.- An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province.

.- Improving Anomalous Sound Detection with Top-M Pseudo-Labeling.

.- Dementia Detection via Speech Temporal Sequences with Shifted Windows.

.- CL-EDiff: Cross-lingual emotional TTS system based on diffusion model.

.- When AI Speaks, Do We Follow? Phonetic Entrainment in Human-AI Dialogues.

.- Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models.

.- Study of the Low-Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement.

.- Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception.

.- UniDaugMamba: A Unimodal Data-augmented Mamba for Speech-Based Depression Detection.

.- Serial-Parallel Dual-Path Architecture for Speaking Style Recognition.

.- Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems.

.- NC-KWS: Few-Shot Class-Incremental Keyword Spotting Based on Neural Collapse.

.- ZSEmo-MTVITS: A Zero-Shot Cross-Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS.

.- CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025.

.- Accent Familiarity and Phonological Weighting in Spoken-Word Recognition.

.- Audio Deepfake Detection via Dual Branch Classifier with Self-Supervised Pre-Trained Model.

.- A Multi-Subspace Attention Approach for Robust Speech Spoofing Detection in Silence-Trimming Conditions.

.- Temporally Consistent Teeth Restoration for Talking Heads.

.- EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands.

.- The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise.

.- Exploring Audio-Visual Fusion for Sound Event Localization and Detection with BEATs.

.- On Multi-Input Multi-Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation.

.- Adaptive Multi-source Fusion for Uyghur ASR Error Correction.

.- The determinants of Chinese lexical stress.

.- Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection.

.- TSELM: Target Speaker Extraction using Discrete Tokens and Language Models.

.- A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features.

.- Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features.

.- A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection.

.- Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement.

.- HiStyle: Hierarchical Style Embedding Prediction for Text-Prompt-Guided Controllable Speech Synthesis.

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Man-Machine Speech Communication

Description

More details

Other editions

Additional editions

Content

System requirements