Result Details

REAL-T: Real Conversational Mixtures for Target Speaker Extraction

LI, S.; WANG, S.; HAN, J.; ZHANG, K.; WANG, W.; LI, H. REAL-T: Real Conversational Mixtures for Target Speaker Extraction. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. Interspeech. Rotterdam, The Netherlands: International Speech Communication Association, 2025. p. 1923-1927.
Type
conference paper
Language
English
Authors
Li Shaole
Wang Shuai
Han Jiangyu, DCGM (FIT)
Zhang Ke
Wang Wupeng
Li Haizhou
Abstract

Current target speaker extraction (TSE) systems achieve remarkable performance on synthetic datasets like LibriMix and WSJMix. However, their effectiveness in real conversational scenarios, where the cocktail party problem is most prevalent, remains largely unexplored. In this paper, we conduct a comprehensive analysis of several speaker diarization datasets and introduce REAL-T, the first conversation-centric dataset specifically designed for TSE in real-world conditions. Our evaluations reveal significant performance degradation of existing TSE models on this dataset, highlighting the unaddressed complexity of real-world speech extraction. To facilitate controlled benchmarking, we define two subsets: BASE and PRIMARY, ensuring more manageable yet challenging evaluation settings.

Keywords

conversational | dataset | REAL-T | Real-world | target speaker extraction

URL
Published
2025
Pages
1923–1927
Journal
Interspeech, ISSN
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
Conference
Interspeech
Publisher
International Speech Communication Association
Place
Rotterdam, The Netherlands
DOI
EID Scopus
BibTeX
@inproceedings{BUT199411,
  author="{} and  {} and Jiangyu {Han} and  {} and  {} and  {}",
  title="REAL-T: Real Conversational Mixtures for Target Speaker Extraction",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association Interspeech",
  year="2025",
  journal="Interspeech",
  pages="1923--1927",
  publisher="International Speech Communication Association",
  address="Rotterdam, The Netherlands",
  doi="10.21437/Interspeech.2025-2662",
  url="https://www.isca-archive.org/interspeech_2025/li25da_interspeech.pdf"
}
Projects
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-23-8278, start: 2023-03-01, end: 2026-02-28, running
Departments
Back to top