Detail výsledku

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

PENG, J.; ASHIHARA, T.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Hyderabad: IEEE Signal Processing Society, 2025. p. 1-5. ISBN: 979-8-3503-6874-1.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Peng Junyi, UPGM (FIT)
ASHIHARA, T.
Delcroix Marc, FIT (FIT)
OCHIAI, T.
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
ARAKI, S.
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

Self-supervised learning (SSL) models have significantly advanced speech
processing tasks, and several benchmarks have been pro- posed to validate their
effectiveness. However, previous benchmarks have primarily focused on
single-speaker scenarios, with less exploration of target-speaker tasks in noisy,
multi-talker conditions-a more challenging yet practical case. In this paper, we
introduce the Target-Speaker Speech Processing Universal Performance Benchmark
(TS-SUPERB), which includes four widely recognized target-speaker processing
tasks that require identifying the target speaker and extracting information from
the speech mixture. In our benchmark, the speaker embedding extracted from
enrollment speech is used as a clue to condition downstream models. The benchmark
result reveals the importance of evaluating SSL models in target speaker
scenarios, demonstrating that performance cannot be easily inferred from related
single-speaker tasks. Moreover, by using a unified SSL-based target speech
encoder, consisting of a speaker encoder and an extractor module, we also
investigate joint optimization across TS tasks to leverage mutual information and
demonstrate its effectiveness.

Klíčová slova

Self-supervised learning, target-speaker speech process, speech recognition,
speech enhancement, voice activity detection

URL
Rok
2025
Strany
1–5
Sborník
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Konference
ICASSP 2025, International Conference on Acoustics, Speech, and Signal Processing
ISBN
979-8-3503-6874-1
Vydavatel
IEEE Signal Processing Society
Místo
Hyderabad
DOI
EID Scopus
BibTeX
@inproceedings{BUT198051,
  author="PENG, J. and ASHIHARA, T. and DELCROIX, M. and OCHIAI, T. and PLCHOT, O. and ARAKI, S. and ČERNOCKÝ, J.",
  title="TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2025",
  pages="1--5",
  publisher="IEEE Signal Processing Society",
  address="Hyderabad",
  doi="10.1109/ICASSP49660.2025.10887574",
  isbn="979-8-3503-6874-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10887574"
}
Soubory
Projekty
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, zahájení: 2025-01-01, ukončení: 2028-12-31, řešení
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Pracoviště
Nahoru