Result Details
Language models for automatic speech recognition of Czech lectures
MIKOLOV, T. Language models for automatic speech recognition of Czech lectures. Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008. p. 1-5. ISBN: 978-80-214-3617-6.
Type
conference paper
Language
English
Authors
Mikolov Tomáš, Ing., Ph.D., DCGM (FIT)
Abstract
The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.
Keywords
language modeling
URL
Annotation
This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.
Published
2008
Pages
1–5
Proceedings
Proc. STUDENT EEICT 2008
Conference
Student EEICT 2008
ISBN
978-80-214-3617-6
Publisher
Faculty of Electrical Engineering and Communication BUT
Place
Brno
BibTeX
@inproceedings{BUT32393,
author="Tomáš {Mikolov}",
title="Language models for automatic speech recognition of Czech lectures",
booktitle="Proc. STUDENT EEICT 2008",
year="2008",
pages="1--5",
publisher="Faculty of Electrical Engineering and Communication BUT",
address="Brno",
isbn="978-80-214-3617-6",
url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/mikolov_eeict2008.pdf"
}
Projects
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Research groups
Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)
Departments