Detail výsledku

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Kos: International Speech Communication Association, 2024. no. 9, p. 2135-2139. ISSN: 1990-9772.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Mošner Ladislav, Ing., UPGM (FIT)
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
VINCENT, E.
Peng Junyi, UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Klíčová slova

multi-channel speaker verification, pre-trained models

URL

Rok

2024

Strany

2135–2139

Časopis

Proceedings of Interspeech, roč. 2024, č. 9, ISSN 1990-9772

Sborník

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Konference

Interspeech Conference

Vydavatel

International Speech Communication Association

Místo

Kos

DOI

10.21437/Interspeech.2024-1260

EID Scopus

2-s2.0-85214847936

BibTeX

@inproceedings{BUT193682,
  author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
  title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2135--2139",
  publisher="International Speech Communication Association",
  address="Kos",
  doi="10.21437/Interspeech.2024-1260",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}

Soubory

pdf mosner_2024_interspeech.pdf 382 kB

Projekty

Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)