Publication Details

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification

PENG, J.; MOŠNER, L.; ZHANG, L.; PLCHOT, O.; STAFYLAKIS, T.; BURGET, L.; ČERNOCKÝ, J. CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification. Proceedings of ICASSP 2025. Hyderabad: IEEE Biometric Council, 2025. p. 1-5. ISBN: 979-8-3503-6874-1.
Czech title
CA-MHFA: Kontextově orientovaný extraktor informace o mluvčím pro ověřování mluvčího na základě samoučení
Type
conference paper
Language
English
Authors
URL
Keywords

Self-supervised learning, speaker verification, speaker extractor, pooling
mechanism, speech classification

Abstract

Self-supervised learning (SSL) models for speaker verifica-
tion (SV) have gained significant attention in recent years. However,
existing SSL-based SV systems often struggle to capture local temporal
dependencies and generalize across different tasks. In this paper, we pro-
pose context-aware multi-head factorized attentive pooling (CA-MHFA),
a lightweight framework that incorporates contextual information from
surrounding frames. CA-MHFA leverages grouped, learnable queries to
effectively model contextual dependencies while maintaining efficiency
by sharing keys and values across groups. Experimental results on the
VoxCeleb dataset show that CA-MHFA achieves EERs of 0.42%, 0.48%,
and 0.96% on Vox1-O, Vox1-E, and Vox1-H, respectively, outperforming
complex models like WavLM-TDNN with fewer parameters and faster
convergence. Additionally, CA-MHFA demonstrates strong generalization
across multiple SSL models and tasks, including emotion recognition and
anti-spoofing, highlighting its robustness and versatility.

Published
2025
Pages
1–5
Proceedings
Proceedings of ICASSP 2025
Conference
ICASSP 2025, International Conference on Acoustics, Speech, and Signal Processing, Hyderabad, IN
ISBN
979-8-3503-6874-1
Publisher
IEEE Biometric Council
Place
Hyderabad
DOI
BibTeX
@inproceedings{BUT198050,
  author="Junyi {Peng} and Ladislav {Mošner} and Lin {Zhang} and Oldřich {Plchot} and Themos {Stafylakis} and Lukáš {Burget} and Jan {Černocký}",
  title="CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification",
  booktitle="Proceedings of ICASSP 2025",
  year="2025",
  pages="1--5",
  publisher="IEEE Biometric Council",
  address="Hyderabad",
  doi="10.1109/ICASSP49660.2025.10889058",
  isbn="979-8-3503-6874-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10889058"
}
Files
Back to top