Result Details
Empirical Evaluation and Combination of Advanced Language Modeling Techniques
Deoras Anoop
Kombrink Stefan, Dipl.-Linguist., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
This paper is on Empirical Evaluation and Combination of Advanced Language Modeling Techniques. Our work is the first attempt to combine many advanced language modeling techniques.
language modeling, neural networks, modelcombination, speech recognition
We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50% and against modified Kneser-Ney smoothed 5-gram over 40%.
@inproceedings{BUT76440,
author="Tomáš {Mikolov} and Anoop {Deoras} and Stefan {Kombrink} and Lukáš {Burget} and Jan {Černocký}",
title="Empirical Evaluation and Combination of Advanced Language Modeling Techniques",
booktitle="Proceedings of Interspeech 2011",
year="2011",
journal="Proceedings of Interspeech",
volume="2011",
number="8",
pages="605--608",
publisher="International Speech Communication Association",
address="Florence",
isbn="978-1-61839-270-1",
issn="1990-9772",
url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/mikolov_interspeech2011_666.pdf"
}
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Technologies of speech processing for efficient human-machine communication, TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA01011328, start: 2011-01-01, end: 2014-12-31, completed
Theory and applications of phoneme posterior estimation in speech processing, GACR, Doktorské granty, GP102/09/P635, start: 2009-01-01, end: 2011-12-31, completed