Result Details
Improving Speaker Verification with Self-Pretrained Transformer Models
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Stafylakis Themos
Mošner Ladislav, Ing., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Recently, fine-tuning large pre-trained Transformer models using
downstream datasets has received a rising interest. Despite
their success, it is still challenging to disentangle the benefits
of large-scale datasets and Transformer structures from the limitations
of the pre-training. In this paper, we introduce a hierarchical
training approach, named self-pretraining, in which
Transformer models are pretrained and finetuned on the same
dataset. Three pre-trained models including HuBERT, Conformer
andWavLM are evaluated on four different speaker verification
datasets with varying sizes. Our experiments show that
these self-pretrained models achieve competitive performance
on downstream speaker verification tasks with only one-third
of the data compared to Librispeech pretraining, such as Vox-
Celeb1 and CNCeleb1. Furthermore, when pre-training only
on the VoxCeleb2-dev, the Conformer model outperforms the
one pre-trained on 94k hours of data using the same fine-tuning
settings.
speaker verification, pre-trained speech transformer model, pre-training,
@inproceedings{BUT185575,
author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
title="Improving Speaker Verification with Self-Pretrained Transformer Models",
booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
year="2023",
journal="Proceedings of Interspeech",
volume="2023",
number="08",
pages="5361--5365",
publisher="International Speech Communication Association",
address="Dublin",
doi="10.21437/Interspeech.2023-453",
issn="1990-9772",
url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}
Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, start: 2020-01-01, end: 2023-08-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed