Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods

Czech title

Robustní diarizace mluvčích pomocí Bayesovské inference a hlubokého učení

Type

grant

Keywords

Machine learning, statistical data processing and applications using signal
processing, Numerical analysis, simulation, optimisation, modelling tools, data
mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive
science, human computer interaction, natural language processing, Complexity and
cryptography, electronic security, privacy, biometrics, Speaker Diarization,
Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech
Data Mining

Abstract

The proposed project deals with Speaker Diarization (SD) which is commonly
defined as the task of answering the question "who spoke when?" in a speech
recording. The first objective of the proposal is to optimize the Bayesian
approach to SD, which has shown to be promising for the tasks. For Variational
Bayes (VB) inference, that is very sensitive to initialization, we will develop
new fast ways of obtaining a good starting point. We will also explore
alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling,
and investigate into alternative priors similar to those introduced for Bayesian
speaker recognition models. The second part of the proposal is motivated by the
huge performance gains that, in recent years, have been brought to other
recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have
been used in the computation of i-vectors, but their potential was never explored
for other stages of SD. We will study ways of integrating DNNs in the different
stages of SD systems. The objectives of the proposal will be achieved by
theoretical work, implementation, and careful testing on real speech data. The
outcomes of the project are intended not only for scientific publications, but
eagerly awaited by European speech data mining industry (for example Czech
Phonexia or Spanish Agnitio). The project is proposed by an excellent female
researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of
University of the Basque Country, one of the most important European labs dealing
with speaker recognition and diarization. The proposed host is the Speech@FIT
group of Brno University of Technology, with a 20-year track of top speech data
mining research. The proposed research training and combination of skills of Dr.
Diez and the host institution have chances to advance the state-of-the-art in
speaker diarization, provide the applicant with improved career opportunities and
benefit European industry.

Team members

Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM) – research leader

Publication Results

2020

DIEZ SÁNCHEZ, M.; BURGET, L.; LANDINI, F.; ČERNOCKÝ, J. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE-ACM Transactions on Audio Speech and Language Processing, 2020, vol. 28, no. 1, p. 355-368. ISSN: 2329-9290. Detail
MATĚJKA, P.; PLCHOT, O.; GLEMBEK, O.; BURGET, L.; ROHDIN, J.; ZEINALI, H.; MOŠNER, L.; SILNOVA, A.; NOVOTNÝ, O.; DIEZ SÁNCHEZ, M.; ČERNOCKÝ, J. 13 years of speaker recognition research at BUT, with longitudinal analysis of NIST SRE. COMPUTER SPEECH AND LANGUAGE, 2020, vol. 2020, no. 63, p. 1-15. ISSN: 0885-2308. Detail

2019

DIEZ SÁNCHEZ, M.; BURGET, L.; WANG, S.; ROHDIN, J.; ČERNOCKÝ, J. Bayesian HMM based x-vector clustering for Speaker Diarization. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. no. 9, p. 346-350. ISSN: 1990-9772. Detail
MATĚJKA, P.; PLCHOT, O.; ZEINALI, H.; MOŠNER, L.; SILNOVA, A.; BURGET, L.; NOVOTNÝ, O.; GLEMBEK, O. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. no. 9, p. 2448-2452. ISSN: 1990-9772. Detail

2018

DIEZ SÁNCHEZ, M.; BURGET, L.; MATĚJKA, P. Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. In Proceedings of Odyssey 2018. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Les Sables d´Olonne: International Speech Communication Association, 2018. no. 6, p. 147-154. ISSN: 2312-2846. Detail
DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; ROHDIN, J.; SILNOVA, A.; ŽMOLÍKOVÁ, K.; NOVOTNÝ, O.; VESELÝ, K.; GLEMBEK, O.; PLCHOT, O.; MOŠNER, L.; MATĚJKA, P. BUT system for DIHARD Speech Diarization Challenge 2018. In Proceedings of Interspeech 2018. Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018. no. 9, p. 2798-2802. ISSN: 1990-9772. Detail
PLCHOT, O.; MATĚJKA, P.; NOVOTNÝ, O.; CUMANI, S.; LOZANO DÍEZ, A.; SLAVÍČEK, J.; DIEZ SÁNCHEZ, M.; GRÉZL, F.; GLEMBEK, O.; KAMSALI VEERA, M.; SILNOVA, A.; BURGET, L.; ONDEL YANG, L.; KESIRAJU, S.; ROHDIN, J. Analysis of BUT-PT Submission for NIST LRE 2017. In Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Les Sables d'Olonne: International Speech Communication Association, 2018. no. 6, p. 47-53. ISSN: 2312-2846. Detail
ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018. p. 4874-4878. ISBN: 978-1-5386-4658-8. Detail

2017

MATĚJKA, P.; NOVOTNÝ, O.; PLCHOT, O.; BURGET, L.; DIEZ SÁNCHEZ, M.; ČERNOCKÝ, J. Analysis of Score Normalization in Multilingual Speaker Recognition. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stockholm: International Speech Communication Association, 2017. no. 08, p. 1567-1571. ISSN: 1990-9772. Detail
MATĚJKA, P.; PLCHOT, O.; NOVOTNÝ, O.; CUMANI, S.; LOZANO DÍEZ, A.; SLAVÍČEK, J.; DIEZ SÁNCHEZ, M.; GRÉZL, F.; GLEMBEK, O.; KAMSALI VEERA, M.; SILNOVA, A.; BURGET, L.; ONDEL YANG, L.; KESIRAJU, S.; ROHDIN, J. BUT- PT System Description for NIST LRE 2017. Proceedings of NIST Language Recognition Workshop 2017. Orlando, Florida: National Institute of Standards and Technology, 2017. p. 1-6. Detail
PLCHOT, O.; MATĚJKA, P.; SILNOVA, A.; NOVOTNÝ, O.; DIEZ SÁNCHEZ, M.; ROHDIN, J.; GLEMBEK, O.; BRÜMMER, N.; SWART, A.; PRIETO, J.; GARCIA PERERA, L.; BUERA, L.; KENNY, P.; ALAM, J.; BHATTACHARYA, G. Analysis and Description of ABC Submission to NIST SRE 2016. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stockholm: International Speech Communication Association, 2017. no. 08, p. 1348-1352. ISSN: 1990-9772. Detail
VESELÝ, K.; BASKAR, M.; DIEZ SÁNCHEZ, M.; BENEŠ, K. MGB-3 but system: Low-resource ASR on Egyptian YouTube data. In Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017. p. 368-373. ISBN: 978-1-5090-4788-8. Detail

Applied Results

2020

Bayesian HMM based x-vector clustering - VBx, software, 2020
Authors: DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.