Result Details

Improving Speaker Verification with Self-Pretrained Transformer Models

PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 08, p. 5361-5365. ISSN: 1990-9772.

Type

conference paper

Language

English

Authors

Peng Junyi, DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Stafylakis Themos
Mošner Ladislav, Ing., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

Recently, fine-tuning large pre-trained Transformer models using
downstream datasets has received a rising interest. Despite
their success, it is still challenging to disentangle the benefits
of large-scale datasets and Transformer structures from the limitations
of the pre-training. In this paper, we introduce a hierarchical
training approach, named self-pretraining, in which
Transformer models are pretrained and finetuned on the same
dataset. Three pre-trained models including HuBERT, Conformer
andWavLM are evaluated on four different speaker verification
datasets with varying sizes. Our experiments show that
these self-pretrained models achieve competitive performance
on downstream speaker verification tasks with only one-third
of the data compared to Librispeech pretraining, such as Vox-
Celeb1 and CNCeleb1. Furthermore, when pre-training only
on the VoxCeleb2-dev, the Conformer model outperforms the
one pre-trained on 94k hours of data using the same fine-tuning
settings.

Keywords

speaker verification, pre-trained speech transformer model, pre-training,

URL

Published

2023

Pages

5361–5365

Journal

Proceedings of Interspeech, vol. 2023, no. 08, ISSN 1990-9772

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Interspeech Conference

Publisher

International Speech Communication Association

Place

Dublin

DOI

10.21437/Interspeech.2023-453

EID Scopus

2-s2.0-85171555712

BibTeX

@inproceedings{BUT185575,
  author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
  title="Improving Speaker Verification with Self-Pretrained Transformer Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="5361--5365",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-453",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}

Files

pdf peng23_interspeech2023_improving.pdf 643 kB

Projects

Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, running
Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, start: 2020-01-01, end: 2023-08-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed

Research groups

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (RG SPEECH)

Departments

Ústav počítačové grafiky a multimédií (DCGM)