Detail výsledku

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Kos: International Speech Communication Association, 2024. no. 9, p. 2135-2139. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Mošner Ladislav, Ing., UPGM (FIT)
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
VINCENT, E.
Peng Junyi, UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Klíčová slova

multi-channel speaker verification, pre-trained models

URL
Rok
2024
Strany
2135–2139
Časopis
Proceedings of Interspeech, roč. 2024, č. 9, ISSN 1990-9772
Sborník
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Kos
DOI
EID Scopus
BibTeX
@inproceedings{BUT193682,
  author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
  title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2135--2139",
  publisher="International Speech Communication Association",
  address="Kos",
  doi="10.21437/Interspeech.2024-1260",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}
Soubory
Projekty
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Výzkumné skupiny
Pracoviště
Nahoru