Result Details

Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors

DIEZ SÁNCHEZ, M.; BURGET, L.; LANDINI, F.; ČERNOCKÝ, J. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE-ACM Transactions on Audio Speech and Language Processing, 2020, vol. 28, no. 1, p. 355-368. ISSN: 2329-9290.

Type

journal article

Language

English

Authors

Diez Sánchez Mireia, M.Sc., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Landini Federico Nicolás, Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

In our previous work, we introduced our Bayesian Hidden Markov Model with eigenvoice priors, which has been recently recognized as the state-of-the-art model for Speaker Diarization. In this paper we present a more complete analysis of the Diarization system. The inference of the model is fully described and derivations of all update formulas are provided for a complete understanding of the algorithm. An extensive analysis on the effect, sensitivity and interactions of all model parameters is provided, which might be used as a guide for their optimal setting. The newly introduced speaker regularization coefficient allows us to control the number of speakers inferred in an utterance. A naive speaker model merging strategy is also presented, which allows to drive the variational inference out of local optima. Experiments for the different diarization scenarios are presented on CALLHOME and DIHARD datasets.

Keywords

Hidden Markov Models, Bayes methods, Task analysis, Probabilistic logic, Training, Speech processing, Complexity theory

URL

Published

2020

Pages

355–368

Journal

IEEE-ACM Transactions on Audio Speech and Language Processing, vol. 28, no. 1, ISSN 2329-9290

DOI

10.1109/TASLP.2019.2955293

UT WoS

000560612800028

EID Scopus

2-s2.0-85075649332

BibTeX

@article{BUT161472,
  author="Mireia {Diez Sánchez} and Lukáš {Burget} and Federico Nicolás {Landini} and Jan {Černocký}",
  title="Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors",
  journal="IEEE-ACM Transactions on Audio Speech and Language Processing",
  year="2020",
  volume="28",
  number="1",
  pages="355--368",
  doi="10.1109/TASLP.2019.2955293",
  issn="2329-9290",
  url="https://ieeexplore.ieee.org/document/8910412"
}

Files

pdf MDiez_IEEE_TASLP_2020.pdf 375 kB

Projects

Information mining in speech acquired by distant microphones, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, start: 2015-10-01, end: 2020-09-30, completed
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods, EU, Horizon 2020, start: 2017-03-01, end: 2019-02-28, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)