Detail výsledku

REAL-T: Real Conversational Mixtures for Target Speaker Extraction

LI, S.; WANG, S.; HAN, J.; ZHANG, K.; WANG, W.; LI, H. REAL-T: Real Conversational Mixtures for Target Speaker Extraction. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. Interspeech. Rotterdam, The Netherlands: International Speech Communication Association, 2025. p. 1923-1927.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Li Shaole
Wang Shuai
Han Jiangyu, UPGM (FIT)
Zhang Ke
Wang Wupeng
Li Haizhou
Abstrakt

Current target speaker extraction (TSE) systems achieve remarkable performance on synthetic datasets like LibriMix and WSJMix. However, their effectiveness in real conversational scenarios, where the cocktail party problem is most prevalent, remains largely unexplored. In this paper, we conduct a comprehensive analysis of several speaker diarization datasets and introduce REAL-T, the first conversation-centric dataset specifically designed for TSE in real-world conditions. Our evaluations reveal significant performance degradation of existing TSE models on this dataset, highlighting the unaddressed complexity of real-world speech extraction. To facilitate controlled benchmarking, we define two subsets: BASE and PRIMARY, ensuring more manageable yet challenging evaluation settings.

Klíčová slova

conversational | dataset | REAL-T | Real-world | target speaker extraction

URL
Rok
2025
Strany
1923–1927
Časopis
Interspeech, ISSN
Sborník
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
Konference
Interspeech
Vydavatel
International Speech Communication Association
Místo
Rotterdam, The Netherlands
DOI
EID Scopus
BibTeX
@inproceedings{BUT199411,
  author="{} and  {} and Jiangyu {Han} and  {} and  {} and  {}",
  title="REAL-T: Real Conversational Mixtures for Target Speaker Extraction",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association Interspeech",
  year="2025",
  journal="Interspeech",
  pages="1923--1927",
  publisher="International Speech Communication Association",
  address="Rotterdam, The Netherlands",
  doi="10.21437/Interspeech.2025-2662",
  url="https://www.isca-archive.org/interspeech_2025/li25da_interspeech.pdf"
}
Projekty
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-23-8278, zahájení: 2023-03-01, ukončení: 2026-02-28, řešení
Pracoviště
Nahoru