Result Details

Neural network based language models for highly inflective languages

MIKOLOV, T.; KOPECKÝ, J.; BURGET, L.; GLEMBEK, O.; ČERNOCKÝ, J. Neural network based language models for highly inflective languages. Proc. ICASSP 2009. Taipei: IEEE Signal Processing Society, 2009. p. 1-4. ISBN: 978-1-4244-2354-5.
Type
conference paper
Language
English
Authors
Mikolov Tomáš, Ing., Ph.D., DCGM (FIT)
Kopecký Jiří, Ing.
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Abstract

The paper is on neural network based language models for highly inflective languages

Keywords

language modeling, neural networks, inflectivelanguages

URL
Annotation

Speech recognition of inflectional and morphologically rich languages like Czech is currently quite a challenging task, because simple n-gram techniques are unable to capture important regularities in the data. Several possible solutions were proposed, namely class based models, factored models, decision trees and neural networks. This paper describes improvements obtained in recognition of spoken Czech lectures using languagemodels based on neural networks. Relative reductions in word error rate are more than 15% over baseline obtained with adapted 4-gram backoff language model using modified Kneser-Ney smoothing.

Published
2009
Pages
1–4
Proceedings
Proc. ICASSP 2009
Conference
International Conference on Acoustics, Speech, and Signal Processing
ISBN
978-1-4244-2354-5
Publisher
IEEE Signal Processing Society
Place
Taipei
BibTeX
@inproceedings{BUT33797,
  author="Tomáš {Mikolov} and Jiří {Kopecký} and Lukáš {Burget} and Ondřej {Glembek} and Jan {Černocký}",
  title="Neural network based language models for highly inflective languages",
  booktitle="Proc. ICASSP 2009",
  year="2009",
  pages="1--4",
  publisher="IEEE Signal Processing Society",
  address="Taipei",
  isbn="978-1-4244-2354-5",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2009/mikolov_ic2009_nnlm_4.pdf"
}
Projects
Overcoming the language barrier complicating investigation into financing terrorism and serious financial crimes, MV, Program bezpečnostního výzkumu, VD20072010B16, start: 2007-08-01, end: 2010-12-31, completed
Research and development of corpus and speech technologies in new generation of electronic dictionaries, MPO, TANDEM, FT-TA3/006, start: 2006-06-01, end: 2009-12-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Research groups
Departments
Back to top