Personal Voice Activity Detection

Czech title

Language

English

Abstract

This work aims to implement, test, and evaluate a speaker-conditioned Voice Activity Detection (VAD) method called Personal VAD. The method builds upon an LSTM-based approach to VAD and its purpose is to introduce a system that can reliably detect speech of a target speaker, while retaining the typical characteristics of a VAD system, mainly in terms of small model size, low latency, and low necessary computational resources. The system is trained to distinguish between three classes: non-speech, target speaker speech, and non-target speaker speech. For this purpose, the method utilizes speaker embeddings as a part of the input feature vector to represent the target speaker. Some of the more heavyweight personal VAD variants also make use of speaker verification scores issued to each frame based on the target embedding, resulting in a more robust system. In addition to the one scoring method presented in the original article, two other scoring approaches are introduced, both outperforming the baseline method and improving the performance even for acoustically challenging conditions.

Keywords

voice activity detection, speech detection, recurrent neural networks, long short-term memory, LSTM, speaker recognition, speaker embeddings, d-vector

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade A

Date

16 June 2021

Reviewer

Landini Federico Nicolás

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen

Citation

SEDLÁČEK, Šimon. Personal Voice Activity Detection. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-16. Supervised by Švec Ján. Available from: https://www.fit.vut.cz/study/thesis/23426/

BibTeX

@bachelorsthesis{FITBT23426,
    author = "\v{S}imon Sedl\'{a}\v{c}ek",
    type = "Bachelor's thesis",
    title = "Personal Voice Activity Detection",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/23426/"
}