Faculty of Information Technology, BUT

Publication Details

DNN Based Embeddings for Language Recognition

LOZANO Díez Alicia, PLCHOT Oldřich, MATĚJKA Pavel and GONZALEZ-RODRIGUEZ Joaquin. DNN Based Embeddings for Language Recognition. In: Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018, pp. 5184-5188. ISBN 978-1-5386-4658-8.
Czech title
DNN Embeddings pro rozpoznávání jazyka
Type
conference paper
Language
english
Authors
Lozano Díez Alicia (UAM)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Matějka Pavel, Ing., Ph.D. (DCGM FIT BUT)
Gonzalez-Rodriguez Joaquin (UAM)
URL
Keywords
Embeddings, language recognition, LID, DNN
Abstract
In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages. In particular, we trained a DNN based on bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) layers, whose frame-by-frame outputs are summarized into mean and standard deviation statistics. After this pooling layer, we add two fully connected layers whose outputs correspond to embeddings. Finally, we add a softmax output layer and train the whole network with multi-class cross-entropy objective to discriminate between languages. We report our results on NIST LRE 2015 and we compare the performance of embeddings and corresponding i-vectors both modeled by Gaussian Linear Classifier (GLC). Using only embeddings resulted in comparable performance to i-vectors and by performing score-level fusion we achieved 7.3% relative improvement over the baseline.
Published
2018
Pages
5184-5188
Proceedings
Proceedings of ICASSP 2018
Conference
IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, CA
ISBN
978-1-5386-4658-8
Publisher
IEEE Signal Processing Society
Place
Calgary, CA
DOI
BibTeX
@INPROCEEDINGS{FITPUB11723,
   author = "Alicia D\'{i}ez Lozano and Old\v{r}ich Plchot and Pavel Mat\v{e}jka and Joaquin Gonzalez-Rodriguez",
   title = "DNN Based Embeddings for Language Recognition",
   pages = "5184--5188",
   booktitle = "Proceedings of ICASSP 2018",
   year = 2018,
   location = "Calgary, CA",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5386-4658-8",
   doi = "10.1109/ICASSP.2018.8462403",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11723"
}
Back to top