Result Details

Sub-word modeling of out of vocabulary words in spoken term detection

SZŐKE, I.; BURGET, L.; ČERNOCKÝ, J.; FAPŠO, M. Sub-word modeling of out of vocabulary words in spoken term detection. Proc. 2008 IEEE Workshop on Spoken Language Technology. Goa: IEEE Signal Processing Society, 2008. p. 1-4. ISBN: 978-1-4244-3472-5.
Type
conference paper
Language
English
Authors
Szőke Igor, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Fapšo Michal, Ing., Ph.D., DCGM (FIT)
Abstract

The work is on sub-word modeling of out of vocabulary words in spoken term detection

Keywords

phone, multigram, spoken term detection, subword, keyword spotting, syllable, lattice

URL
Annotation

This paper deals with comparison of sub-word based methods for
spoken term detection (STD) task and phone recognition. The subword units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.

Published
2008
Pages
1–4
Proceedings
Proc. 2008 IEEE Workshop on Spoken Language Technology
Conference
2008 IEEE Workshop on Spoken Language Technology
ISBN
978-1-4244-3472-5
Publisher
IEEE Signal Processing Society
Place
Goa
BibTeX
@inproceedings{BUT33448,
  author="Igor {Szőke} and Lukáš {Burget} and Jan {Černocký} and Michal {Fapšo}",
  title="Sub-word modeling of out of vocabulary words in spoken term detection",
  booktitle="Proc. 2008 IEEE Workshop on Spoken Language Technology",
  year="2008",
  pages="1--4",
  publisher="IEEE Signal Processing Society",
  address="Goa",
  isbn="978-1-4244-3472-5",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/szoke_slt2008.pdf"
}
Projects
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Research groups
Departments
Back to top