
Man-Machine Speech Communication
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the refereed proceedings of the 20th National Conference on Man-Machine Speech Communication, NCMMSC 2025, held in Zhenjiang, China, during October 16-19, 2025.
The 40 papers included in these proceedings were carefully reviewed and selected from 157 submissions. the conference will feature special events such as a Young Scholars Forum, Student Forum, Industry Forum, and Product and Technology Exhibition. Beyond the main program, the conference will also include publicoutreach activities, grant-writing workshops, and several special sessions.
More details
Other editions
Additional editions

Content
.- Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech
Recognition in Constrained Scenarios.
.- Multilevel and Granular L2 Pronunciation Assessment Using Stress-Based
Suprasegmental Features and Proficiency Adaptation.
.- CDMGTU-Net: A Causal Dual-Branch Multi-Channel Speech Enhancement Network
with Multi-Scale Gateted Feature Fusion.
.- A Two-Stage Band-Split Mamba-2 Network For Music Source Separation.
.- Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text.
.- MambaVoc: State Space Models for High-Fidelity Audio Synthesis.
.- StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding.
.- Automatic Speech Evaluation Method Leveraging Deep Feature Fusion.
.- Curriculum Reinforcement Learning for Robust Low-Resource Chinese Dialect Speech Recognition.
.- An Acoustic Study on Intonation Production of English Learners from Guanzhong Region in Shaanxi Province.
.- Improving Anomalous Sound Detection with Top-M Pseudo-Labeling.
.- Dementia Detection via Speech Temporal Sequences with Shifted Windows.
.- CL-EDiff: Cross-lingual emotional TTS system based on diffusion model.
.- When AI Speaks, Do We Follow? Phonetic Entrainment in Human-AI Dialogues.
.- Aishell1Mix: Towards Robust Mandarin Speech Separation with Scalable Audio Language Models.
.- Study of the Low-Rank Minimum Variance Distortionless Response Beamformer for Speech Enhancement.
.- Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception.
.- UniDaugMamba: A Unimodal Data-augmented Mamba for Speech-Based Depression Detection.
.- Serial-Parallel Dual-Path Architecture for Speaking Style Recognition.
.- Knowledge Augmented Finetuning Matters in Both RAG and Agent Based Dialog Systems.
.- NC-KWS: Few-Shot Class-Incremental Keyword Spotting Based on Neural Collapse.
.- ZSEmo-MTVITS: A Zero-Shot Cross-Lingual Emotional Speech Synthesis Model for Mandarin and Tibetan Based on VITS.
.- CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025.
.- Accent Familiarity and Phonological Weighting in Spoken-Word Recognition.
.- Audio Deepfake Detection via Dual Branch Classifier with Self-Supervised Pre-Trained Model.
.- A Multi-Subspace Attention Approach for Robust Speech Spoofing Detection in Silence-Trimming Conditions.
.- Temporally Consistent Teeth Restoration for Talking Heads.
.- EEG as a Biometric Identifier: The Impact of Electrode Arrangement, Brain Areas, and Frequency Bands.
.- The Phonetic Modification and Facial Movements Made During Mandarin Vowel and Tone Production in Noise.
.- Exploring Audio-Visual Fusion for Sound Event Localization and Detection with BEATs.
.- On Multi-Input Multi-Frame MVDR Filter for Speech Enhancement with Heterophasic Presentation.
.- Adaptive Multi-source Fusion for Uyghur ASR Error Correction.
.- The determinants of Chinese lexical stress.
.- Introducing Discriminative Speaker Embeddings for Voice Timbre Attribute Detection.
.- TSELM: Target Speaker Extraction using Discrete Tokens and Language Models.
.- A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features.
.- Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features.
.- A Hierarchical Fusion Modeling from Perception to Prediction with Personalized Features for Multimodal Depression Detection.
.- Revisiting Target Signal Definitions in Distortionless Superdirective Beamforming for Reverberant Speech Enhancement.
.- HiStyle: Hierarchical Style Embedding Prediction for Text-Prompt-Guided Controllable Speech Synthesis.
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.