Detail výsledku

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

BRUMMER, J.; SILNOVA, A.; BURGET, L.; STAFYLAKIS, T. Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model. In Proceedings of Odyssey 2018. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Les Sables d'Olonne: International Speech Communication Association, 2018. no. 6, p. 349-356. ISSN: 2312-2846.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Brummer Johan Nikolaas Langenhoven, Dr.
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Stafylakis Themos
Abstrakt

Embeddings in machine learning are low-dimensional representationsof complex input patterns, with the property that simplegeometric operations like Euclidean distances and dot productscan be used for classification and comparison tasks. Weintroduce meta-embeddings, which live in more general innerproduct spaces and which are designed to better propagate uncertaintythrough the embedding bottleneck. Traditional embeddingsare trained to maximize between-class and minimizewithin-class distances. Meta-embeddings are trained to maximizerelevant information throughput. As a proof of conceptin speaker recognition, we derive an extractor from the familiargenerative Gaussian PLDA model (GPLDA). We show thatGPLDA likelihood ratio scores are given by Hilbert space innerproducts between Gaussian likelihood functions, which weterm Gaussian meta-embeddings (GMEs). Meta-embedding extractorscan be generatively or discriminatively trained. GMEsextracted by GPLDA have fixed precisions and do not propagateuncertainty. We show that a generalization to heavy-tailedPLDA gives GMEs with variable precisions, which do propagateuncertainty. Experiments on NIST SRE 2010 and 2016show that the proposed method applied to i-vectors withoutlength normalization is up to 20% more accurate than GPLDAapplied to length-normalized i-vectors.

Klíčová slova

embeddings, machine learning, speaker recognition

URL
Rok
2018
Strany
349–356
Časopis
Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland, roč. 2018, č. 6, ISSN 2312-2846
Sborník
Proceedings of Odyssey 2018
Konference
Odyssey 2018
Vydavatel
International Speech Communication Association
Místo
Les Sables d'Olonne
DOI
EID Scopus
BibTeX
@inproceedings{BUT155077,
  author="Johan Nikolaas Langenhoven {Brummer} and Anna {Silnova} and Lukáš {Burget} and Themos {Stafylakis}",
  title="Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model",
  booktitle="Proceedings of Odyssey 2018",
  year="2018",
  journal="Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland",
  volume="2018",
  number="6",
  pages="349--356",
  publisher="International Speech Communication Association",
  address="Les Sables d'Olonne",
  doi="10.21437/Odyssey.2018-49",
  issn="2312-2846",
  url="https://www.fit.vut.cz/research/publication/11790/"
}
Soubory
Projekty
Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Neuronové sítě pro zpracování signálu a dolování informací v řeči - NOSIČI, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, zahájení: 2018-01-01, ukončení: 2019-12-31, ukončen
Výzkumné skupiny
Pracoviště
Nahoru