Result Details

Language models for automatic speech recognition of Czech lectures

MIKOLOV, T. Language models for automatic speech recognition of Czech lectures. Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008. p. 1-5. ISBN: 978-80-214-3617-6.
Type
conference paper
Language
English
Authors
Mikolov Tomáš, Ing., Ph.D., DCGM (FIT)
Abstract

The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.

Keywords

language modeling

URL
Annotation

This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.

Published
2008
Pages
1–5
Proceedings
Proc. STUDENT EEICT 2008
Conference
Student EEICT 2008
ISBN
978-80-214-3617-6
Publisher
Faculty of Electrical Engineering and Communication BUT
Place
Brno
BibTeX
@inproceedings{BUT32393,
  author="Tomáš {Mikolov}",
  title="Language models for automatic speech recognition of Czech lectures",
  booktitle="Proc. STUDENT EEICT 2008",
  year="2008",
  pages="1--5",
  publisher="Faculty of Electrical Engineering and Communication BUT",
  address="Brno",
  isbn="978-80-214-3617-6",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/mikolov_eeict2008.pdf"
}
Projects
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Research groups
Departments
Back to top