Thesis Details

Unsupervised Evaluation of Speaker Recognition System

Bachelor's Thesis Student: Odehnal Ondřej Academic Year: 2021/2022 Supervisor: Matějka Pavel, Ing., Ph.D.
Czech title
Evaluace systému na rozpoznávání mluvčího na neznámých datech
Language
English
Abstract

The context of this thesis is the state-of-the-art system for speaker identification (SID) based on the deep nerual network with x-vector embeddings. This thesis aims to propose and experimentally assess several techniques for evaluating the SID system using unlabelled datasets. For this purpose, discriminative embedding is created for every recording in the dataset. These embeddings are used to cluster the recordings and thus create pseudo-labels corresponding to different clusters. The SID system evaluation is based on equal error rate (EER), which uses these pseudo-labels. We proposed several unsupervised learning algorithms to achieve this; K-means, Gaussian mixture models (GMM), and agglomerative hierarchical clustering (AHC). After thorough testing, the K-means model with the Silhouette value showed the best results. This method achieved an estimate of 5.72 % EER with the reference EER equal to 5.15 % on SITW dev-core-core. Similar results were observed on the SITW eval-core-core, where the estimated EER is equal to 5.86 % and the reference 5.08 %. The difference between estimated and reference EER is 0.57 % for the dev-core-core and 0.78 % for the eval-core-core. Another series of experiments were conducted on NIST SRE16 and VoxCeleb1 to verify robustness of the proposed method. Generally, the developed testing process had an estimated error of around 1 % in all test databases, an excellent result for an unsupervised learning technique.

Keywords

speaker recognition, speech verification, unsupervised learning, clustering, evaluation, GMM, AHC, EER, elbow method, K-means

Department
Degree Programme
Files
Status
defended, grade A
Date
15 June 2022
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Citation
ODEHNAL, Ondřej. Unsupervised Evaluation of Speaker Recognition System. Brno, 2022. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-15. Supervised by Matějka Pavel. Available from: https://www.fit.vut.cz/study/thesis/24991/
BibTeX
@bachelorsthesis{FITBT24991,
    author = "Ond\v{r}ej Odehnal",
    type = "Bachelor's thesis",
    title = "Unsupervised Evaluation of Speaker Recognition System",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/24991/"
}
Back to top