Result Details

SHDA: Sinkhorn Domain Attention for Cross-Domain Audio Anti-Spoofing

ZHANG, R.; WEI, J.; LU, X.; ZHANG, L.; JIN, D.; XU, J.; LU, W. SHDA: Sinkhorn Domain Attention for Cross-Domain Audio Anti-Spoofing. IEEE Transactions on Information Forensics and Security, 2025, no. 20, p. 6474-6489.
Type
journal article
Language
English
Authors
Zhang Ruiteng
Wei Jianguo
Lu Xugang
Zhang Lin, Ph.D.
Jin Di
Xu Junhai
Lu Wenhuan
Abstract

Audio anti-spoofing algorithms struggle with fake samples from unseen spoofing techniques, even when trained with diverse data sets or data augmentation strategies. Unsupervised domain adaptation (UDA) algorithms have the potential to mitigate this challenge. Typically, UDA assumes that the source and target domains are distinct distributions with clear boundaries and seeks to align model representations between them. However, in anti-spoofing, various spoofing algorithms could cause the distributions of the generated samples to overlap, resulting in unclear domain boundaries. This hinders UDA algorithms from effectively measuring and aligning domain discrepancies. Moreover, forcibly aligning samples with significant discrepancies could diminish the model's discriminative capability. To solve this problem, we propose a domain attention algorithm with optimal transport (OT), termed Sinkhorn Domain Attention (SHDA). Unlike traditional attention mechanisms, SHDA identifies the optimal transfer plan by analyzing the global probability differences among cross-domain samples. Specifically, we first extract audio representations from various domains to compute the overall cost matrix between the source and target domains. Next, we employ Sinkhorn's iteration to calculate the OT coupling matrix, where cross-domain samples with minor differences receive higher transfer weights, while those with substantial differences receive lower weights. Finally, we use the coupling and cost matrices to compute the adaptation loss, effectively transferring the anti-spoofing model from multiple sources to the target domain. We conducted eight cross-domain experiments using eleven well-known anti-spoofing corpora. The results indicate that our label-free SHDA surpassed the state-of-the-art model by 40%.

Keywords

Feature extraction, Adaptation models, Training, Couplings, Data models, Costs, Performance evaluation, Data mining, Data augmentation, Computational modeling, Audio anti-spoofing, cross-domain, unsupervised domain adaptation, optimal transport, domain attention

URL
Published
2025
Pages
6474–6489
Journal
IEEE Transactions on Information Forensics and Security, no. 20, ISSN
DOI
UT WoS
001521429100006
BibTeX
@article{BUT199982,
  author="{} and  {} and  {} and Lin {Zhang} and  {} and  {} and  {}",
  title="SHDA: Sinkhorn Domain Attention for Cross-Domain Audio Anti-Spoofing",
  journal="IEEE Transactions on Information Forensics and Security",
  year="2025",
  number="20",
  pages="6474--6489",
  doi="10.1109/TIFS.2025.3576576",
  issn="1556-6013",
  url="https://ieeexplore.ieee.org/abstract/document/11024052"
}
Projects
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-23-8278, start: 2023-03-01, end: 2026-02-28, running
Research groups
Departments
Back to top