International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 8, Issue 1 (January-February 2026) Submit your research before last 3 days of February to publish your research paper in the issue of January-February.

Whisper-Aware Spectro-Transformer U-Net for Emotion- Preserving Multilingual Speech Enhancement

Author(s) Mr. Raghu M, Dr. M N Nachappa
Country India
Abstract Whisper-Aware Spectro-Transformer U-Net (WAST-U-Net), a multilingual, emotion-preserving speech enhancement model optimized for automatic speech recognition (ASR). Extending the U-Former backbone, our architecture integrates Transformer blocks at skip connections, emotion and language embeddings at the bottleneck, and a novel Whisper-WER loss that directly optimizes ASR intelligibility. Unlike traditional models that prioritize noise suppression at the cost of expressiveness, WAST-U-Net enhances speech while preserving speaker emotion and linguistic identity. Evaluated on VoiceBank-DEMAND and a Kannada-English code-mixed dataset, our model achieves state-of-the-art performance across PESQ, STOI, SI-SNR, Whisper-WER, and emotion accuracy. Ablation studies confirm the synergistic contribution of each component. This framework sets a new benchmark for multilingual, emotionally intelligent speech enhancement, paving the way for accessible ASR in noisy, real-world environments.
Keywords Speech enhancement, U-Net, Transformer, Whisper, multilingual speech, emotion-aware systems, ASR optimization, log-mel spectrogram, Whisper-WER loss, low-resource languages
Field Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In Volume 8, Issue 1, January-February 2026
Published On 2026-02-04
DOI https://doi.org/10.36948/ijfmr.2026.v08i01.67991

Share this