Result Details

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

PENG, J.; ASHIHARA, T.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Hyderabad: IEEE Signal Processing Society, 2025. p. 1-5. ISBN: 979-8-3503-6874-1.

Type

conference paper

Language

English

Authors

Peng Junyi, DCGM (FIT)
ASHIHARA, T.
Delcroix Marc, FIT (FIT)
OCHIAI, T.
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
ARAKI, S.
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

Self-supervised learning (SSL) models have significantly advanced speech
processing tasks, and several benchmarks have been pro- posed to validate their
effectiveness. However, previous benchmarks have primarily focused on
single-speaker scenarios, with less exploration of target-speaker tasks in noisy,
multi-talker conditions-a more challenging yet practical case. In this paper, we
introduce the Target-Speaker Speech Processing Universal Performance Benchmark
(TS-SUPERB), which includes four widely recognized target-speaker processing
tasks that require identifying the target speaker and extracting information from
the speech mixture. In our benchmark, the speaker embedding extracted from
enrollment speech is used as a clue to condition downstream models. The benchmark
result reveals the importance of evaluating SSL models in target speaker
scenarios, demonstrating that performance cannot be easily inferred from related
single-speaker tasks. Moreover, by using a unified SSL-based target speech
encoder, consisting of a speaker encoder and an extractor module, we also
investigate joint optimization across TS tasks to leverage mutual information and
demonstrate its effectiveness.

Keywords

Self-supervised learning, target-speaker speech process, speech recognition,
speech enhancement, voice activity detection

URL

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574

Published

2025

Pages

1–5

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

ICASSP 2025, International Conference on Acoustics, Speech, and Signal Processing

ISBN

979-8-3503-6874-1

Publisher

IEEE Signal Processing Society

Place

Hyderabad

DOI

10.1109/ICASSP49660.2025.10887574

EID Scopus

2-s2.0-105003873681

BibTeX

@inproceedings{BUT198051,
  author="PENG, J. and ASHIHARA, T. and DELCROIX, M. and OCHIAI, T. and PLCHOT, O. and ARAKI, S. and ČERNOCKÝ, J.",
  title="TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2025",
  pages="1--5",
  publisher="IEEE Signal Processing Society",
  address="Hyderabad",
  doi="10.1109/ICASSP49660.2025.10887574",
  isbn="979-8-3503-6874-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574"
}

Files

pdf TS-SUPERB_A_Target_Speech_Processing_Benchmark_for_Speech_Self-Supervised_Learning_Models.pdf 790 kB

Projects

Linguistics, Artificial Intelligence and Language and Speech Technologies: from Research to Applications, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, start: 2025-01-01, end: 2028-12-31, running
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed

Departments

Ústav počítačové grafiky a multimédií (DCGM)