Publication Details

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

PENG, J.; ASHIHARA, T.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models. Proceedings of ICASSP 2025. Hyderabad: IEEE Biometric Council, 2025. p. 1-5. ISBN: 979-8-3503-6874-1.
Czech title
TS-SUPERB: Sada dat a experimentů ověření zpracování řeči cílového mluvčího pomocí modelů řeči získaných samoučením
Type
conference paper
Language
English
Authors
Peng Junyi (DCGM)
ASHIHARA, T.
Delcroix Marc
OCHIAI, T.
Plchot Oldřich, Ing., Ph.D. (DCGM)
ARAKI, S.
Černocký Jan, prof. Dr. Ing. (DCGM)
URL
Keywords

Self-supervised learning, target-speaker speech process, speech recognition,
speech enhancement, voice activity detection

Abstract

Self-supervised learning (SSL) models have significantly advanced speech
processing tasks, and several benchmarks have been pro- posed to validate their
effectiveness. However, previous benchmarks have primarily focused on
single-speaker scenarios, with less exploration of target-speaker tasks in noisy,
multi-talker conditions-a more challenging yet practical case. In this paper, we
introduce the Target-Speaker Speech Processing Universal Performance Benchmark
(TS-SUPERB), which includes four widely recognized target-speaker processing
tasks that require identifying the target speaker and extracting information from
the speech mixture. In our benchmark, the speaker embedding extracted from
enrollment speech is used as a clue to condition downstream models. The benchmark
result reveals the importance of evaluating SSL models in target speaker
scenarios, demonstrating that performance cannot be easily inferred from related
single-speaker tasks. Moreover, by using a unified SSL-based target speech
encoder, consisting of a speaker encoder and an extractor module, we also
investigate joint optimization across TS tasks to leverage mutual information and
demonstrate its effectiveness.

Published
2025
Pages
1–5
Proceedings
Proceedings of ICASSP 2025
Conference
ICASSP 2025, International Conference on Acoustics, Speech, and Signal Processing, Hyderabad, IN
ISBN
979-8-3503-6874-1
Publisher
IEEE Biometric Council
Place
Hyderabad
DOI
BibTeX
@inproceedings{BUT198051,
  author="PENG, J. and ASHIHARA, T. and DELCROIX, M. and OCHIAI, T. and PLCHOT, O. and ARAKI, S. and ČERNOCKÝ, J.",
  title="TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models",
  booktitle="Proceedings of ICASSP 2025",
  year="2025",
  pages="1--5",
  publisher="IEEE Biometric Council",
  address="Hyderabad",
  doi="10.1109/ICASSP49660.2025.10887574",
  isbn="979-8-3503-6874-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574"
}
Files
Back to top