Faculty of Information Technology, BUT

Publication Details

Bayesian HMM based x-vector clustering for Speaker Diarization

DIEZ Sánchez Mireia, BURGET Lukáš, WANG Shuai, ROHDIN Johan A. and ČERNOCKÝ Jan. Bayesian HMM based x-vector clustering for Speaker Diarization. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 346-350. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2813.pdf
Czech title
Bayesovské shlukování x-vektorů založené na HMM pro diarizaci
Type
conference paper
Language
english
Authors
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Wang Shuai (DCGM FIT BUT)
Rohdin Johan A., Dr. (DCGM FIT BUT)
Černocký Jan, doc. Dr. Ing. (DCGM FIT BUT)
URL
Keywords
Speaker Diarization, Variational Bayes, HMM, x-vector, DIHARD
Abstract
This paper presents a simplified version of the previously proposed diarization algorithm based on Bayesian Hidden Markov Models, which uses Variational Bayesian inference for very fast and robust clustering of x-vector (neural network based speaker embeddings). The presented results show that this clustering algorithm provides significant improvements in diarization performance as compared to the previously used Agglomerative Hierarchical Clustering. The output of this system can be further employed as an initialization for a second stage VB diarization system, using frame-wise MFCC features as input, to obtain optimal results.
Published
2019
Pages
346-350
Journal
Proceedings of Interspeech, vol. 2019, no. 9, ISSN 1990-9772
Proceedings
Proceedings of Interspeech
Conference
INTERSPEECH 2019, INTERSPEECH 2019, AT
Publisher
International Speech Communication Association
Place
Graz, AT
DOI
BibTeX
@INPROCEEDINGS{FITPUB12085,
   author = "Mireia S\'{a}nchez Diez and Luk\'{a}\v{s} Burget and Shuai Wang and A. Johan Rohdin and Jan \v{C}ernock\'{y}",
   title = "Bayesian HMM based x-vector clustering for Speaker Diarization",
   pages = "346--350",
   booktitle = "Proceedings of Interspeech",
   journal = "Proceedings of Interspeech",
   volume = 2019,
   number = 9,
   year = 2019,
   location = "Graz, AT",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2019-2813",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12085"
}
Back to top