Query-by-Example Spoken Term Detection

Czech title

Vyhledávání výrazů v řeči pomocí mluvených příkladů

Language

English

Abstract

This thesis investigates query-by-example (QbE) spoken term detection (STD). Queries are entered in their spoken form and searched for in a pool of recorded spoken utterances, providing a list of detections with their scores and timing. We describe, analyze and compare three different approaches to QbE STD, in various language-dependent and language-independent setups with diverse audio conditions, searching for a single example and five examples per query.

For our experiments we used Czech, Hungarian, English and Levantine data and for each of the languages we trained a 3-state phone posterior estimator. This gave us 16 possible combinations of the evaluation language and the language of the posterior estimator, out of which 4 combinations were language-dependent and 12 were language-independent. All QbE systems were evaluated on the same data and the same features, using the metrics: non-pooled Figure-of-Merit and our proposed utterrance-normalized non-pooled Figure-of-Merit, which provided us with relevant data for the comparison of these QbE approaches and for gaining a better insight into their behavior.

QbE approaches presented in this work are: sequential statistical modeling (GMM/HMM), template matching of features (DTW) and matching of phone lattices (WFST). To compare the performance of QbE approaches with the common query-by-text STD systems, for language-dependent setups we also evaluated an acoustic keyword spotting system (AKWS) and a system searching for phone strings in lattices (WFSTlat). The core of this thesis is the development, analysis and improvement of the WFST QbE STD system, which after the improvements, achieved similar performance to the DTW system in language-dependent setups.

Keywords

Query-by-Example, Spoken Term Detection, Finite State Transducers, System comparison, Language dependency, Low-resource languages

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology, Field of Study Information Technology

Files

Status

defended

Date

17 December 2014

Citation

FAPŠO, Michal. Query-by-Example Spoken Term Detection. Brno, 2014. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2014-12-17. Supervised by Černocký Jan. Available from: https://www.fit.vut.cz/study/phd-thesis/282/

BibTeX

@phdthesis{FITPT282,
    author = "Michal Fap\v{s}o",
    type = "Ph.D. thesis",
    title = "Query-by-Example Spoken Term Detection",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2014,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/282/"
}