Detail výsledku

DNN Based Embeddings for Language Recognition

LOZANO DÍEZ, A.; PLCHOT, O.; MATĚJKA, P.; GONZALEZ-RODRIGUEZ, J. DNN Based Embeddings for Language Recognition. In Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018. p. 5184-5188. ISBN: 978-1-5386-4658-8.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Lozano Díez Alicia, Ph.D.
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Matějka Pavel, Ing., Ph.D., UPGM (FIT)
Gonzalez-Rodriguez Joaquin, FIT (FIT)
Abstrakt

In this work, we present a language identification (LID) systembased on embeddings. In our case, an embedding is a fixed-lengthvector (similar to i-vector) that represents the whole utterance, butunlike i-vector it is designed to contain mostly information relevantto the target task (LID). In order to obtain these embeddings, wetrain a deep neural network (DNN) with sequence summarizationlayer to classify languages. In particular, we trained a DNN basedon bidirectional long short-term memory (BLSTM) recurrent neuralnetwork (RNN) layers, whose frame-by-frame outputs are summarizedinto mean and standard deviation statistics. After this poolinglayer, we add two fully connected layers whose outputs correspondto embeddings. Finally, we add a softmax output layer and train thewhole network with multi-class cross-entropy objective to discriminatebetween languages. We report our results on NIST LRE 2015and we compare the performance of embeddings and correspondingi-vectors both modeled by Gaussian Linear Classifier (GLC). Usingonly embeddings resulted in comparable performance to i-vectorsand by performing score-level fusion we achieved 7.3% relativeimprovement over the baseline.

Klíčová slova

Embeddings, language recognition, LID, DNN

URL
Rok
2018
Strany
5184–5188
Sborník
Proceedings of ICASSP 2018
Konference
IEEE International Conference on Acoustics, Speech and Signal Processing
ISBN
978-1-5386-4658-8
Vydavatel
IEEE Signal Processing Society
Místo
Calgary
DOI
UT WoS
000446384605071
EID Scopus
BibTeX
@inproceedings{BUT155045,
  author="Alicia {Lozano Díez} and Oldřich {Plchot} and Pavel {Matějka} and Joaquin {Gonzalez-Rodriguez}",
  title="DNN Based Embeddings for Language Recognition",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="5184--5188",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8462403",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11723/"
}
Soubory
Projekty
Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Výzkumné skupiny
Pracoviště
Nahoru