Robust SPEAKER DIariazation systems using Bayesian inferenCE and deep learning methods
Project Period: 1. 3. 2017 - 28. 2. 2019
Project Type: grant
Agency: European Comission EU
Program: Horizon 2020
Machine learning, statistical data processing and applications using signal processing, Numerical analysis, simulation, optimisation, modelling tools, data mining, Ontologies, neural networks, genetic programming, fuzzy logic, Cognitive science, human computer interaction, natural language processing, Complexity and cryptography, electronic security, privacy, biometrics, Speaker Diarization, Speaker Recognition, Variational Bayes Inference, Deep Neural Networks, Speech Data Mining
The proposed project deals with Speaker Diarization (SD) which is commonly defined as the task of answering the question "who spoke when?" in a speech recording. The first objective of the proposal is to optimize the Bayesian approach to SD, which has shown to be promising for the tasks. For Variational Bayes (VB) inference, that is very sensitive to initialization, we will develop new fast ways of obtaining a good starting point. We will also explore alternative inference methods, such as collapsed VB or collapsed Gibbs Sampling, and investigate into alternative priors similar to those introduced for Bayesian speaker recognition models. The second part of the proposal is motivated by the huge performance gains that, in recent years, have been brought to other recognition tasks by Deep Neural Networks (DNNs). In the context of SD, DNNs have been used in the computation of i-vectors, but their potential was never explored for other stages of SD. We will study ways of integrating DNNs in the different stages of SD systems. The objectives of the proposal will be achieved by theoretical work, implementation, and careful testing on real speech data. The outcomes of the project are intended not only for scientific publications, but eagerly awaited by European speech data mining industry (for example Czech Phonexia or Spanish Agnitio). The project is proposed by an excellent female researcher, Dr. Mireia Diez, having finished her thesis in the GTTS group of University of the Basque Country, one of the most important European labs dealing with speaker recognition and diarization. The proposed host is the Speech@FIT group of Brno University of Technology, with a 20-year track of top speech data mining research. The proposed research training and combination of skills of Dr. Diez and the host institution have chances to advance the state-of-the-art in speaker diarization, provide the applicant with improved career opportunities and benefit European industry.