Publication Details

Multi-Channel Extension of Pre-trained Models for Speaker Verification

MOŠNER, L.; SERIZEL, R.; BURGET, L.; PLCHOT, O.; VINCENT, E.; PENG, J.; ČERNOCKÝ, J. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Kos: International Speech Communication Association, 2024. p. 2135-2139. ISSN: 1990-9772.

Czech title

Vícekanálové rozšíření předtrénovaných modelů pro ověřování mluvčího

Type

conference paper

Language

English

Authors

Mošner Ladislav, Ing. (DCGM)
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
VINCENT, E.
Peng Junyi (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

multi-channel speaker verification, pre-trained models

Abstract

In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.

Published

2024

Pages

2135–2139

Journal

Proceedings of Interspeech, vol. 2024, no. 9, ISSN 1990-9772

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Interspeech Conference, Kos, GR

Publisher

International Speech Communication Association

Place

Kos

DOI

10.21437/Interspeech.2024-1260

EID Scopus

2-s2.0-85214847936

BibTeX

@inproceedings{BUT193682,
  author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
  title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2135--2139",
  publisher="International Speech Communication Association",
  address="Kos",
  doi="10.21437/Interspeech.2024-1260",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}

Files

pdf mosner_2024_interspeech.pdf 382 kB