Result Details

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

TEJEDOR, J.; FAPŠO, M.; SZŐKE, I.; ČERNOCKÝ, J.; GRÉZL, F. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, vol. 2012, no. 30, p. 1-34. ISSN: 1046-8188.
Type
journal article
Language
English
Authors
Tejedor Javier
Fapšo Michal, Ing., Ph.D., DCGM (FIT)
Szőke Igor, Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Grézl František, Ing., Ph.D., DCGM (FIT)
Abstract

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is notentered as text, but selected in speech data or spoken. Two feature extractors based on neural networks(NN) are introduced: the first producing phone-state posteriors and the second making use of a compressiveNN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hiddenMarkov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors,the third one, based on weighted finite-state transducers (WFST), processes phone lattices.

Keywords

Experimentation, Query-by-example, DTW-based query-by-example, GMM/HMM-basedquery-by-example, WFST-based query-by-example, bottleneck features, keyword spotting

URL
Annotation

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors, the third one, based on weighted finite-state transducers (WFST), processes phone lattices. QbE STD is compared to two standard STD systems with text queries: acoustic keyword spotting and WFST-based search of phone strings in phone lattices. The results are reported on four languages (Czech, English, Hungarian, and Levantine Arabic) using standard metrics: equal error rate (EER) and two versions of popular figureof- merit (FOM). Language-dependent and language-independent cases are investigated; the latter being particularly interesting for scenarios lacking standard resources to train speech recognition systems. While the DTW and GMM/HMM approaches produce the best results for a language-dependent setup depending on the target language, the GMM/HMM approach performs the best dealing with a language-independent setup. As far as WFSTs are concerned, they are promising as they allow for indexing and fast search.

Published
2012
Pages
1–34
Journal
ACM TRANSACTIONS ON INFORMATION SYSTEMS, vol. 2012, no. 30, ISSN 1046-8188
Book
ACM Transactions on Information Systems (TOIS)
Publisher
Association for Computing Machinery
Place
New York
DOI
BibTeX
@article{BUT97057,
  author="Javier {Tejedor} and Michal {Fapšo} and Igor {Szőke} and Jan {Černocký} and František {Grézl}",
  title="Comparison of methods for language-dependent and language-independent query-by-example spoken term detection",
  journal="ACM TRANSACTIONS ON INFORMATION SYSTEMS",
  year="2012",
  volume="2012",
  number="30",
  pages="1--34",
  doi="10.1145/2328967.2328971",
  issn="1046-8188",
  url="http://dl.acm.org/citation.cfm?id=2328971&CFID=187707319&CFTOKEN=67886685"
}
Projects
Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
Language-independent spoken term detection, GACR, Postdoktorandské granty, GPP202/12/P567, start: 2012-01-01, end: 2014-12-31, completed
Multilingual recognition and search in speech for electronic dictionaries, MPO, TIP, FR-TI1/034, start: 2009-09-01, end: 2013-08-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Research groups
Departments
Back to top