Thesis Details

Personal Voice Activity Detection

Bachelor's Thesis Student: Sedláček Šimon Academic Year: 2020/2021 Supervisor: Švec Ján, Ing.
Czech title
Personal Voice Activity Detection
Language
English
Abstract

This work aims to implement, test, and evaluate a speaker-conditioned Voice Activity Detection (VAD) method called Personal VAD. The method builds upon an LSTM-based approach to VAD and its purpose is to introduce a system that can reliably detect speech of a target speaker, while retaining the typical characteristics of a VAD system, mainly in terms of small model size, low latency, and low necessary computational resources. The system is trained to distinguish between three classes: non-speech, target speaker speech, and non-target speaker speech. For this purpose, the method utilizes speaker embeddings as a part of the input feature vector to represent the target speaker. Some of the more heavyweight personal VAD variants also make use of speaker verification scores issued to each frame based on the target embedding, resulting in a more robust system. In addition to the one scoring method presented in the original article, two other scoring approaches are introduced, both outperforming the baseline method and improving the performance even for acoustically challenging conditions.

Keywords

voice activity detection, speech detection, recurrent neural networks, long short-term memory, LSTM, speaker recognition, speaker embeddings, d-vector

Department
Degree Programme
Information Technology
Files
Status
defended, grade A
Date
16 June 2021
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Citation
SEDLÁČEK, Šimon. Personal Voice Activity Detection. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-16. Supervised by Švec Ján. Available from: https://www.fit.vut.cz/study/thesis/23426/
BibTeX
@bachelorsthesis{FITBT23426,
    author = "\v{S}imon Sedl\'{a}\v{c}ek",
    type = "Bachelor's thesis",
    title = "Personal Voice Activity Detection",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/23426/"
}
Back to top