Result Details

End-to-end DNN based text-independent speaker recognition for long and short utterances

ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O. End-to-end DNN based text-independent speaker recognition for long and short utterances. COMPUTER SPEECH AND LANGUAGE, 2020, vol. 2020, no. 59, p. 22-35. ISSN: 0885-2308.
Type
journal article
Language
English
Authors
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), DCGM (FIT)
Silnova Anna, M.Sc., Ph.D., DCGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Abstract

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Keywords

Speaker verification, DNN, End-to-end, Text-independent, i-vector, PLDA

URL
Published
2020
Pages
22–35
Journal
COMPUTER SPEECH AND LANGUAGE, vol. 2020, no. 59, ISSN 0885-2308
DOI
UT WoS
000490540900002
EID Scopus
BibTeX
@article{BUT158088,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget} and Ondřej {Glembek}",
  title="End-to-end DNN based text-independent speaker recognition for long and short utterances",
  journal="COMPUTER SPEECH AND LANGUAGE",
  year="2020",
  volume="2020",
  number="59",
  pages="22--35",
  doi="10.1016/j.csl.2019.06.002",
  issn="0885-2308",
  url="https://www.sciencedirect.com/science/article/pii/S0885230818303632"
}
Files
Projects
Improving Robustnes in Automatic Speaker Recognition, GACR, Juniorské granty, GJ17-23870Y, start: 2017-01-01, end: 2019-12-31, completed
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Neural networks for signal processing and speech data mining, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, start: 2018-01-01, end: 2019-12-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
NTT - Speech enhancement front-end for robust automatic speech recognition with large amount of training data, NTT, start: 2019-01-01, end: 2019-12-31, completed
Sequence summarizing neural networks for speaker recognition, EU, Horizon 2020, 5SA15094, start: 2016-07-01, end: 2019-06-30, completed
Zpracování, zobrazování a analýza multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-17-3984, start: 2017-03-01, end: 2020-02-29, completed
Research groups
Departments
Back to top