Publication Details

Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors

DIEZ Sánchez Mireia, BURGET Lukáš, LANDINI Federico Nicolás and ČERNOCKÝ Jan. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 28, no. 1, pp. 355-368. ISSN 2329-9290. Available from: https://ieeexplore.ieee.org/document/8910412
Type
journal article
Language
english
Authors
URL
Keywords

Hidden Markov Models, Bayes methods, Task analysis, Probabilistic logic, Training, Speech processing, Complexity theory

Abstract

In our previous work, we introduced our Bayesian Hidden Markov Model with eigenvoice priors, which has been recently recognized as the state-of-the-art model for Speaker Diarization. In this paper we present a more complete analysis of the Diarization system. The inference of the model is fully described and derivations of all update formulas are provided for a complete understanding of the algorithm. An extensive analysis on the effect, sensitivity and interactions of all model parameters is provided, which might be used as a guide for their optimal setting. The newly introduced speaker regularization coefficient allows us to control the number of speakers inferred in an utterance. A naive speaker model merging strategy is also presented, which allows to drive the variational inference out of local optima. Experiments for the different diarization scenarios are presented on CALLHOME and DIHARD datasets.

Published
2019
Pages
355-368
Journal
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 28, no. 1, ISSN 2329-9290
Publisher
IEEE Signal Processing Society
DOI
UT WoS
000560612800028
EID Scopus
BibTeX
@ARTICLE{FITPUB12139,
   author = "Mireia S\'{a}nchez Diez and Luk\'{a}\v{s} Burget and Nicol\'{a}s Federico Landini and Jan \v{C}ernock\'{y}",
   title = "Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors",
   pages = "355--368",
   journal = "IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING",
   volume = 28,
   number = 1,
   year = 2019,
   ISSN = "2329-9290",
   doi = "10.1109/TASLP.2019.2955293",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12139"
}
Back to top