Result Details

Morphological random forests for language modeling of inflectional languages

OPARIN, I.; GLEMBEK, O.; BURGET, L.; ČERNOCKÝ, J. Morphological random forests for language modeling of inflectional languages. Proc. 2008 IEEE Workshop on Spoken Language Technology. Goa: IEEE Signal Processing Society, 2008. p. 1-4. ISBN: 978-1-4244-3472-5.
Type
conference paper
Language
English
Authors
Oparin Ilya
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Abstract

The paper is on morphological random forests for language modeling of inflectional languages

Keywords

speech recognition, language modeling

URL
Annotation

In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs of both kind outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.

Published
2008
Pages
1–4
Proceedings
Proc. 2008 IEEE Workshop on Spoken Language Technology
Conference
2008 IEEE Workshop on Spoken Language Technology
ISBN
978-1-4244-3472-5
Publisher
IEEE Signal Processing Society
Place
Goa
BibTeX
@inproceedings{BUT30729,
  author="Ilya {Oparin} and Ondřej {Glembek} and Lukáš {Burget} and Jan {Černocký}",
  title="Morphological random forests for language modeling of inflectional languages",
  booktitle="Proc. 2008 IEEE Workshop on Spoken Language Technology",
  year="2008",
  pages="1--4",
  publisher="IEEE Signal Processing Society",
  address="Goa",
  isbn="978-1-4244-3472-5",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/Oparin_SLT2008.pdf"
}
Projects
Interactive Keyword Detector, GACR, Postdoktorandské granty, GP102/06/P383, start: 2006-01-01, end: 2008-12-31, completed
Overcoming the language barrier complicating investigation into financing terrorism and serious financial crimes, MV, Program bezpečnostního výzkumu, VD20072010B16, start: 2007-08-01, end: 2010-12-31, completed
Research and development of corpus and speech technologies in new generation of electronic dictionaries, MPO, TANDEM, FT-TA3/006, start: 2006-06-01, end: 2009-12-31, completed
Research groups
Departments
Back to top