Result Details

REAL-T: Real Conversational Mixtures for Target Speaker Extraction

LI, S.; WANG, S.; HAN, J.; ZHANG, K.; WANG, W.; LI, H. REAL-T: Real Conversational Mixtures for Target Speaker Extraction. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. Interspeech. Rotterdam, The Netherlands: International Speech Communication Association, 2025. p. 1923-1927.

Type

conference paper

Language

English

Authors

Li Shaole
Wang Shuai
Han Jiangyu, DCGM (FIT)
Zhang Ke
Wang Wupeng
Li Haizhou

Abstract

Current target speaker extraction (TSE) systems achieve remarkable performance on synthetic datasets like LibriMix and WSJMix. However, their effectiveness in real conversational scenarios, where the cocktail party problem is most prevalent, remains largely unexplored. In this paper, we conduct a comprehensive analysis of several speaker diarization datasets and introduce REAL-T, the first conversation-centric dataset specifically designed for TSE in real-world conditions. Our evaluations reveal significant performance degradation of existing TSE models on this dataset, highlighting the unaddressed complexity of real-world speech extraction. To facilitate controlled benchmarking, we define two subsets: BASE and PRIMARY, ensuring more manageable yet challenging evaluation settings.

Keywords

conversational | dataset | REAL-T | Real-world | target speaker extraction

URL

https://www.isca-archive.org/interspeech_2025/li25da_interspeech.pdf

Published

2025

Pages

1923–1927

Journal

Interspeech, ISSN

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

Conference

Interspeech

Publisher

International Speech Communication Association

Place

Rotterdam, The Netherlands

DOI

10.21437/Interspeech.2025-2662

EID Scopus

2-s2.0-105020086006

BibTeX

@inproceedings{BUT199411,
  author="{} and  {} and Jiangyu {Han} and  {} and  {} and  {}",
  title="REAL-T: Real Conversational Mixtures for Target Speaker Extraction",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association Interspeech",
  year="2025",
  journal="Interspeech",
  pages="1923--1927",
  publisher="International Speech Communication Association",
  address="Rotterdam, The Netherlands",
  doi="10.21437/Interspeech.2025-2662",
  url="https://www.isca-archive.org/interspeech_2025/li25da_interspeech.pdf"
}

Projects

Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-23-8278, start: 2023-03-01, end: 2026-02-28, running

Departments

Ústav počítačové grafiky a multimédií (DCGM)