Detail výsledku

Improving Speaker Verification with Self-Pretrained Transformer Models

PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 08, p. 5361-5365. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Peng Junyi, UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Stafylakis Themos
Mošner Ladislav, Ing., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

Recently, fine-tuning large pre-trained Transformer models using
downstream datasets has received a rising interest. Despite
their success, it is still challenging to disentangle the benefits
of large-scale datasets and Transformer structures from the limitations
of the pre-training. In this paper, we introduce a hierarchical
training approach, named self-pretraining, in which
Transformer models are pretrained and finetuned on the same
dataset. Three pre-trained models including HuBERT, Conformer
andWavLM are evaluated on four different speaker verification
datasets with varying sizes. Our experiments show that
these self-pretrained models achieve competitive performance
on downstream speaker verification tasks with only one-third
of the data compared to Librispeech pretraining, such as Vox-
Celeb1 and CNCeleb1. Furthermore, when pre-training only
on the VoxCeleb2-dev, the Conformer model outperforms the
one pre-trained on 94k hours of data using the same fine-tuning
settings.

Klíčová slova

speaker verification, pre-trained speech transformer model, pre-training,

URL
Rok
2023
Strany
5361–5365
Časopis
Proceedings of Interspeech, roč. 2023, č. 08, ISSN 1990-9772
Sborník
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Dublin
DOI
EID Scopus
BibTeX
@inproceedings{BUT185575,
  author="Junyi {Peng} and Oldřich {Plchot} and Themos {Stafylakis} and Ladislav {Mošner} and Lukáš {Burget} and Jan {Černocký}",
  title="Improving Speaker Verification with Self-Pretrained Transformer Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="5361--5365",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-453",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/peng23_interspeech.pdf"
}
Soubory
Projekty
Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení
Výzkumné skupiny
Pracoviště
Nahoru