Publication Details

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Kos: International Speech Communication Association, 2024. p. 2135-2139. ISSN: 1990-9772.
Czech title
Vícekanálové rozšíření předtrénovaných modelů pro ověřování mluvčího
Type
conference paper
Language
English
Authors
URL
Keywords

multi-channel speaker verification, pre-trained models

Abstract

In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Published
2024
Pages
2135–2139
Journal
Proceedings of Interspeech, vol. 2024, no. 9, ISSN 1990-9772
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
Interspeech Conference, Kos, GR
Publisher
International Speech Communication Association
Place
Kos
DOI
EID Scopus
BibTeX
@inproceedings{BUT193682,
  author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
  title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2135--2139",
  publisher="International Speech Communication Association",
  address="Kos",
  doi="10.21437/Interspeech.2024-1260",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}
Files
Back to top