Detail výsledku

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

PENG, J.; ASHIHARA, T.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Hyderabad: IEEE Signal Processing Society, 2025. p. 1-5. ISBN: 979-8-3503-6874-1.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Peng Junyi, UPGM (FIT)
ASHIHARA, T.
Delcroix Marc, FIT (FIT)
OCHIAI, T.
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
ARAKI, S.
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

Self-supervised learning (SSL) models have significantly advanced speech
processing tasks, and several benchmarks have been pro- posed to validate their
effectiveness. However, previous benchmarks have primarily focused on
single-speaker scenarios, with less exploration of target-speaker tasks in noisy,
multi-talker conditions-a more challenging yet practical case. In this paper, we
introduce the Target-Speaker Speech Processing Universal Performance Benchmark
(TS-SUPERB), which includes four widely recognized target-speaker processing
tasks that require identifying the target speaker and extracting information from
the speech mixture. In our benchmark, the speaker embedding extracted from
enrollment speech is used as a clue to condition downstream models. The benchmark
result reveals the importance of evaluating SSL models in target speaker
scenarios, demonstrating that performance cannot be easily inferred from related
single-speaker tasks. Moreover, by using a unified SSL-based target speech
encoder, consisting of a speaker encoder and an extractor module, we also
investigate joint optimization across TS tasks to leverage mutual information and
demonstrate its effectiveness.

Klíčová slova

Self-supervised learning, target-speaker speech process, speech recognition,
speech enhancement, voice activity detection

URL

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574

Rok

2025

Strany

1–5

Sborník

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Konference

ICASSP 2025, International Conference on Acoustics, Speech, and Signal Processing

ISBN

979-8-3503-6874-1

Vydavatel

IEEE Signal Processing Society

Místo

Hyderabad

DOI

10.1109/ICASSP49660.2025.10887574

EID Scopus

2-s2.0-105003873681

BibTeX

@inproceedings{BUT198051,
  author="PENG, J. and ASHIHARA, T. and DELCROIX, M. and OCHIAI, T. and PLCHOT, O. and ARAKI, S. and ČERNOCKÝ, J.",
  title="TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2025",
  pages="1--5",
  publisher="IEEE Signal Processing Society",
  address="Hyderabad",
  doi="10.1109/ICASSP49660.2025.10887574",
  isbn="979-8-3503-6874-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574"
}

Soubory

pdf TS-SUPERB_A_Target_Speech_Processing_Benchmark_for_Speech_Self-Supervised_Learning_Models.pdf 790 kB

Projekty

Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, zahájení: 2025-01-01, ukončení: 2028-12-31, řešení
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)