Publication Details

Hybrid Word-Subword Speech Recognition - a Powerful Tool to Search in Speech

ČERNOCKÝ, J.; SZŐKE, I.; HANNEMANN, M.; KOMBRINK, S.; FAPŠO, M. Hybrid Word-Subword Speech Recognition - a Powerful Tool to Search in Speech. Proceedings of 21st International Conference Radioelektronika 2011. Brno: Department of Radioelectronics FEEC BUT, 2011. p. 25-25. ISBN: 978-1-61284-322-3.

Czech title

Hybridní slovní a podslovní rozpoznávání řeči - výkonný nástroj pro vyhledávání v řeči

Type

abstract

Language

English

Authors

Černocký Jan, prof. Dr. Ing. (DCGM)
Szőke Igor, Ing., Ph.D. (DCGM)
Hannemann Mirko, Ph.D.
Kombrink Stefan, Dipl.-Linguist.
Fapšo Michal, Ing., Ph.D.

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2011/cernocky_radioelektronika2011_150_invited.pdf

Annotation

Main-stream systems for searching information in speech are based on Large Vocabulary Continuous Speech Recognizer (LVCSR) with fixed vocabulary. The keywords or key-phrases are subsequently searched in its output. These systems have severe problems with Out of Vocabulary (OOV) words, that are common when one changes the domain (for example from standard to medical), speaker (normal versus highly educated), or even date (new words appearing in TV news). This talk will present our work in designing hybrid word-subword recognition systems, that have a combined recognition network. Under normal circumstances, they output standard word strings, while they are allowed to switch to subword description for unknown inputs. Such systems are good not only for detecting OOVs, but also subsequent steps leading to their exploitation. Under the EC-sponsored DIRAC project, we have investigated analysis of detected OOVs, conversion to standard word-form, and finding links to in-vocabulary words and other OOVs. The results will be demonstrated on real speech data from popular TED lectures.

Published

2011

Pages

25–25

Book

Proceedings of 21st International Conference Radioelektronika 2011

Conference

Radioelektronika 2011, 21st International Conference, Brno, CZ

ISBN

978-1-61284-322-3

Publisher

Department of Radioelectronics FEEC BUT

Place

Brno

BibTeX

@misc{BUT192773,
  author="Jan {Černocký} and Igor {Szőke} and Mirko {Hannemann} and Stefan {Kombrink} and Michal {Fapšo}",
  title="Hybrid Word-Subword Speech Recognition - a Powerful Tool to Search in Speech",
  booktitle="Proceedings of 21st International Conference Radioelektronika 2011",
  year="2011",
  pages="25--25",
  publisher="Department of Radioelectronics FEEC BUT",
  address="Brno",
  isbn="978-1-61284-322-3",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/cernocky_radioelektronika2011_150_invited.pdf",
  note="abstract"
}