Result Details

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Kos: International Speech Communication Association, 2024. no. 9, p. 2135-2139. ISSN: 1990-9772.
Type
conference paper
Language
English
Authors
Mošner Ladislav, Ing., DCGM (FIT)
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
VINCENT, E.
Peng Junyi, DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Abstract

In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Keywords

multi-channel speaker verification, pre-trained models

URL
Published
2024
Pages
2135–2139
Journal
Proceedings of Interspeech, vol. 2024, no. 9, ISSN 1990-9772
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
Interspeech Conference
Publisher
International Speech Communication Association
Place
Kos
DOI
EID Scopus
BibTeX
@inproceedings{BUT193682,
  author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
  title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2135--2139",
  publisher="International Speech Communication Association",
  address="Kos",
  doi="10.21437/Interspeech.2024-1260",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}
Files
Projects
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Research groups
Departments
Back to top