Detail výsledku
Multi-Channel Extension of Pre-trained Models for Speaker Verification
SERIZEL, R.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
VINCENT, E.
Peng Junyi, UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
In this work, we focus on designing a multi-channel speech
processing system based on large pre-trained models. These
models are typically trained for single-channel scenarios via
self-supervised learning (SSL). A common approach to using
the SSL models with microphone array data is to prepend it
with a multi-channel speech enhancement. The downside is that
spatial information can be leveraged only by the pre-processing
stage, and enhancement errors get propagated to the SSL model.
We aim to alleviate the issue by designing METRO, a Multi-channel ExTension of pRe-trained mOdels. It interleaves per-
channel processing with cross-channel information exchange,
eventually fusing channels into one. While our approach is general, here we focus on multi-channel speaker verification. Our
experiments on the MultiSV corpus show noteworthy improvements over the best-published results on the dataset.
multi-channel speaker verification, pre-trained models
@inproceedings{BUT193682,
author="MOŠNER, L. and SERIZEL, R. and BURGET, L. and PLCHOT, O. and VINCENT, E. and PENG, J. and ČERNOCKÝ, J.",
title="Multi-Channel Extension of Pre-trained Models for Speaker Verification",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2024",
journal="Proceedings of Interspeech",
volume="2024",
number="9",
pages="2135--2139",
publisher="International Speech Communication Association",
address="Kos",
doi="10.21437/Interspeech.2024-1260",
issn="1990-9772",
url="https://www.isca-archive.org/interspeech_2024/mosner24_interspeech.pdf"
}