Result Details

Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords

HANNEMANN, M.; KOMBRINK, S.; KARAFIÁT, M.; BURGET, L. Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords. Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010). Proceedings of Interspeech. Makuhari, Chiba: International Speech Communication Association, 2010. no. 9, p. 897-900. ISBN: 978-1-61782-123-3. ISSN: 1990-9772.
Type
conference paper
Language
English
Authors
Hannemann Mirko, Ph.D., DCGM (FIT)
Kombrink Stefan, Dipl.-Linguist., DCGM (FIT)
Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Abstract

This paper is on development of a similarity measure to detect repeatedly occuring Out-of-Vocabulary words (OOV), because they carry an important information.

Keywords

out-of-vocabulary, OOV, hybrid word/sub-word recognizer, similarity measure, alignment error model

URL
Annotation

We develop a similarity measure to detect repeatedly occurring Out-of-Vocabulary words (OOV), since these carry important information. Sub-word sequences in the recognition output from a hybrid word/sub-word recognizer are taken as detected OOVs and are aligned to each other with the help of an alignment error model. This model is able to deal with partial OOV detections and tries to reveal more complex word relations such as compound words. We apply the model to a selection of conversational phone calls to retrieve other examples of the same OOV, and to obtain a higher-level description of it such as being a derivation of a known word.

Published
2010
Pages
897–900
Journal
Proceedings of Interspeech, vol. 2010, no. 9, ISSN 1990-9772
Proceedings
Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010)
Conference
Interspeech Conference
ISBN
978-1-61782-123-3
Publisher
International Speech Communication Association
Place
Makuhari, Chiba
BibTeX
@inproceedings{BUT34859,
  author="Mirko {Hannemann} and Stefan {Kombrink} and Martin {Karafiát} and Lukáš {Burget}",
  title="Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords",
  booktitle="Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010)",
  year="2010",
  journal="Proceedings of Interspeech",
  volume="2010",
  number="9",
  pages="897--900",
  publisher="International Speech Communication Association",
  address="Makuhari, Chiba",
  isbn="978-1-61782-123-3",
  issn="1990-9772",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/hanneman_interspeech2010_IS100358.pdf"
}
Projects
DIRAC - Detection and Identification of Rare Audio-visual Cues, MŠMT, Šestý rámcový program Evropského společenství pro výzkum, technický rozvoj a demonstrační činnosti, 027787, start: 2006-01-01, end: 2010-12-31, completed
Recognition and presentation of multimedia data, BUT, Vnitřní projekty VUT, FIT-S-10-2, 2010, start: 2010-04-01, end: 2010-12-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Research groups
Departments
Back to top