Improving Robustness of Speaker Recognition using Discriminative Techniques

Czech title

Zvyšování robustnosti systémů pro rozpoznávání mluvčích pomocí diskriminativních technik

Language

English

Abstract

This work deals with discriminative techniques in speaker verification systems to improve robustness of the systems against factors that negatively affect their performance. These factors include noise, reverberation, or the transmission channel.

The thesis consists of two main parts. In the first part, it deals with a theoretical introduction to current state-of-the-art speaker verification systems. The recognition system's steps are described, starting from the extraction of acoustic features, the extraction of vector representations of recordings, and the final recognition score computation. Particular emphasis is paid to the techniques of extraction of a vector representation of a recording, where we describe two different paradigms: the i-vectors and the x-vectors.The second part of the work focuses more on discriminative techniques to increase robustness. Their description is organized to match the gradual passage of the recording through the verification system. First, attention is paid to signal pre-processing using a neural network for noise reduction and speech enhancement. This pre-processing is a universal technique independent of the verification system. The work follows by focusing on the use of a discriminative approach in the extraction of features and the extraction of vector representations of recordings.

Furthermore, this work sheds light on the transition from generative systems to discriminative systems.In order to give a fuller context, the work also describes techniques that had historically preceded this transition. All presented techniques are always experimentally verified and their advantages evaluated.We are proposing several techniques that have proved successful in both the generative approach in the form of i-vectors and discriminative x-vectors, and thanks to them, considerable improvement has been achieved.For completeness, in the field of robustness, other techniques are included in the work, such as normalization of scores or multi-condition training.Finally, the work deals with the robustness of discriminative systems in terms of data used in their training.

Keywords

Speaker verification, generative training, discriminative training, speech enhancement, i-vector, x-vector, robustness, noise, reverberation, neural networks.

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Computer Science and Engineering, Field of Study Computer Science and Engineering

Files

Status

defended

Date

3 December 2021

Citation

NOVOTNÝ, Ondřej. Improving Robustness of Speaker Recognition using Discriminative Techniques. Brno, 2021. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2021-12-03. Supervised by Černocký Jan. Available from: https://www.fit.vut.cz/study/phd-thesis/1033/

BibTeX

@phdthesis{FITPT1033,
    author = "Ond\v{r}ej Novotn\'{y}",
    type = "Ph.D. thesis",
    title = "Improving Robustness of Speaker Recognition using Discriminative Techniques",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/1033/"
}