Detail výsledku

Analysis and Optimization of Bottleneck Features for Speaker Recognition

LOZANO DÍEZ, A.; SILNOVA, A.; MATĚJKA, P.; GLEMBEK, O.; PLCHOT, O.; PEŠÁN, J.; BURGET, L.; GONZALEZ-RODRIGUEZ, J. Analysis and Optimization of Bottleneck Features for Speaker Recognition. In Proceedings of Odyssey 2016. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Bilbao: International Speech Communication Association, 2016. no. 06, p. 352-357. ISSN: 2312-2846.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Lozano Díez Alicia, Ph.D.
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Matějka Pavel, Ing., Ph.D., UPGM (FIT)
Glembek Ondřej, Ing., Ph.D., UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Pešán Jan, Ing., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Gonzalez-Rodriguez Joaquin, FIT (FIT)

Abstrakt

Recently, Deep Neural Network (DNN) based bottleneck features proved to be very effective in i-vector based speaker recognition. However, the bottleneck feature extraction is usually fully optimized for speech rather than speaker recognition task. In this paper, we explore whether DNNs suboptimal for speech recognition can provide better bottleneck features for speaker recognition. We experiment with different features optimized for speech or speaker recognition as input to the DNN. We also experiment with under-trained DNN, where the training was interrupted before the full convergence of the speech recognition objective. Moreover, we analyze the effect of normalizing the features at the input and/or at the output of bottleneck features extraction to see how it affects the final speaker recognition system performance. We evaluated the systems in the SRE10, condition 5, female task. Results show that the best configuration of the DNN in terms of phone accuracy does not necessary imply better performance of the final speaker recognition system. Finally, we compare the performance of bottleneck features and the standard MFCC features in i-vector/PLDA speaker recognition system. The best bottleneck features yield up to 37% of relative improvement in terms of EER.

Klíčová slova

bottleneck features, speaker recognition

URL

Rok

2016

Strany

352–357

Časopis

Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland, roč. 2016, č. 06, ISSN 2312-2846

Sborník

Proceedings of Odyssey 2016

Konference

Odyssey 2016

Vydavatel

International Speech Communication Association

Místo

Bilbao

DOI

10.21437/Odyssey.2016-51

EID Scopus

2-s2.0-85073255478

BibTeX

@inproceedings{BUT131002,
  author="Alicia {Lozano Díez} and Anna {Silnova} and Pavel {Matějka} and Ondřej {Glembek} and Oldřich {Plchot} and Jan {Pešán} and Lukáš {Burget} and Joaquin {Gonzalez-Rodriguez}",
  title="Analysis and Optimization of Bottleneck Features for Speaker Recognition",
  booktitle="Proceedings of Odyssey 2016",
  year="2016",
  journal="Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland",
  volume="2016",
  number="06",
  pages="352--357",
  publisher="International Speech Communication Association",
  address="Bilbao",
  doi="10.21437/Odyssey.2016-51",
  issn="2312-2846",
  url="http://www.odyssey2016.org/papers/pdfs_stamped/54.pdf"
}

Soubory

pdf zeinali_interspeech2016_IS161174.pdf 531 kB

Projekty

Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Zpracování, rozpoznávání a zobrazování multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-14-2506, zahájení: 2014-01-01, ukončení: 2016-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)