Result Details

End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA

ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018. p. 4874-4878. ISBN: 978-1-5386-4658-8.

Type

conference paper

Language

English

Authors

Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), DCGM (FIT)
Silnova Anna, M.Sc., Ph.D., DCGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)

Abstract

Recently, several end-to-end speaker verification systems based ondeep neural networks (DNNs) have been proposed. These systemshave been proven to be competitive for text-dependent tasks as wellas for text-independent tasks with short utterances. However, fortext-independent tasks with longer utterances, end-to-end systemsare still outperformed by standard i-vector + PLDA systems. In thiswork, we develop an end-to-end speaker verification system that isinitialized to mimic an i-vector + PLDA baseline. The system isthen further trained in an end-to-end manner but regularized so thatit does not deviate too far from the initial system. In this way wemitigate overfitting which normally limits the performance of endto-end systems. The proposed system outperforms the i-vector +PLDA baseline on both long and short duration utterances.

Keywords

Speaker verification, DNN, end-to-end

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2018/rohdin… PDF

Published

2018

Pages

4874–4878

Proceedings

Proceedings of ICASSP

Conference

IEEE International Conference on Acoustics, Speech and Signal Processing

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Calgary

DOI

10.1109/ICASSP.2018.8461958

UT WoS

000446384605009

EID Scopus

2-s2.0-85054212885

BibTeX

@inproceedings{BUT155046,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget}",
  title="End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA",
  booktitle="Proceedings of ICASSP",
  year="2018",
  pages="4874--4878",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8461958",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11724/"
}

Files

pdf rohdin_icassp2018_0004874.pdf 211 kB

Projects

Improving Robustnes in Automatic Speaker Recognition, GACR, Juniorské granty, GJ17-23870Y, start: 2017-01-01, end: 2019-12-31, completed
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Neural networks for signal processing and speech data mining, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, start: 2018-01-01, end: 2019-12-31, completed
NTT - Speech enhancement front-end for robust automatic speech recognition with large amount of training data, NTT, start: 2017-10-01, end: 2018-09-30, completed
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods, EU, Horizon 2020, start: 2017-03-01, end: 2019-02-28, completed
Sequence summarizing neural networks for speaker recognition, EU, Horizon 2020, 5SA15094, start: 2016-07-01, end: 2019-06-30, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)