Publication Details

Discriminative Training of VBx Diarization

KLEMENT Dominik, DIEZ Sánchez Mireia, LANDINI Federico Nicolás, BURGET Lukáš, SILNOVA Anna, DELCROIX Marc and TAWARA Naohiro. Discriminative Training of VBx Diarization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024, pp. 11871-11875. ISBN 979-8-3503-4485-1. Available from: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446119
Czech title
Diskriminativní trénování VBx diarizace mluvčích
Type
conference paper
Language
english
Authors
Klement Dominik, Bc. (FIT BUT)
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM FIT BUT)
Landini Federico Nicolás (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Silnova Anna, MSc., Ph.D. (DCGM FIT BUT)
Delcroix Marc (NTT)
Tawara Naohiro (NTT)
URL
Keywords

speaker diarization, VBx, clustering, variational Bayes, discriminative training

Abstract

Bayesian HMM clustering of x-vector sequences (VBx) has be- come a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a gen- eratively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to esti- mate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discrim- inative training, which directly optimizes a predefined loss. We also propose a new loss that better correlates with the diarization error rate compared to binary cross-entropy - the default choice for diarization end-to-end systems. Proof-of-concept results across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's capability of automatically finding hyperparameters, achieving comparable performance to those found by extensive grid search, which typically requires additional hyperparameter behavior knowledge. Moreover, we show that discriminative fine-tuning of PLDA can further improve the model's performance. We release the source code with this publication.

Published
2024
Pages
11871-11875
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR
ISBN
979-8-3503-4485-1
Publisher
IEEE Signal Processing Society
Place
Seoul, KR
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13277,
   author = "Dominik Klement and Mireia S\'{a}nchez Diez and Nicol\'{a}s Federico Landini and Luk\'{a}\v{s} Burget and Anna Silnova and Marc Delcroix and Naohiro Tawara",
   title = "Discriminative Training of VBx Diarization",
   pages = "11871--11875",
   booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
   year = 2024,
   location = "Seoul, KR",
   publisher = "IEEE Signal Processing Society",
   ISBN = "979-8-3503-4485-1",
   doi = "10.1109/ICASSP48485.2024.10446119",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13277"
}
Back to top