Faculty of Information Technology, BUT

Publication Details

End-to-end DNN based text-independent speaker recognition for long and short utterances

ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel, BURGET Lukáš and GLEMBEK Ondřej. End-to-end DNN based text-independent speaker recognition for long and short utterances. Computer Speech and Language, vol. 2020, no. 59, pp. 22-35. ISSN 0885-2308. Available from: https://www.sciencedirect.com/science/article/pii/S0885230818303632
Czech title
Rozpoznávání mluvčího závislé na textu založené na End-to-end DNN přístupu pro dlouhé a krátké promluvy
Type
journal article
Language
english
Authors
URL
Keywords
Speaker verification, DNN, End-to-end, Text-independent, i-vector, PLDA
Abstract
Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.
Published
2019
Pages
22-35
Journal
Computer Speech and Language, vol. 2020, no. 59, ISSN 0885-2308
Publisher
Elsevier Science
DOI
BibTeX
@ARTICLE{FITPUB12038,
   author = "A. Johan Rohdin and Anna Silnova and Mireia S\'{a}nchez Diez and Old\v{r}ich Plchot and Pavel Mat\v{e}jka and Luk\'{a}\v{s} Burget and Ond\v{r}ej Glembek",
   title = "End-to-end DNN based text-independent speaker recognition for long and short utterances",
   pages = "22--35",
   journal = "Computer Speech and Language",
   volume = 2020,
   number = 59,
   year = 2019,
   ISSN = "0885-2308",
   doi = "10.1016/j.csl.2019.06.002",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12038"
}
Back to top