Result Details

Language models for automatic speech recognition of Czech lectures

MIKOLOV, T. Language models for automatic speech recognition of Czech lectures. Proc. STUDENT EEICT 2008. Brno: Faculty of Electrical Engineering and Communication BUT, 2008. p. 1-5. ISBN: 978-80-214-3617-6.

Type

conference paper

Language

English

Authors

Mikolov Tomáš, Ing., Ph.D., DCGM (FIT)

Abstract

The paper is on LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION OF CZECH LECTURES.

Keywords

language modeling

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2008/mikolov…

Annotation

This paper describes improvements in Automatic Speech Recognition (ASR) of Czech lectures obtained by enhancing language models. Our baseline is a statistical trigram language model with Good-Turing smoothing, trained on half billion words from newspapers, books etc. The overall improvement from adding more training data is over 10% in accuracy absolute, while using advanced language modeling techniques - mainly neural networks - yields another 3%. Perplexity improvements and OOV reduction are discussed too.

Published

2008

Pages

1–5

Proceedings

Proc. STUDENT EEICT 2008

Conference

Student EEICT 2008

ISBN

978-80-214-3617-6

Publisher

Faculty of Electrical Engineering and Communication BUT

Place

Brno

BibTeX

@inproceedings{BUT32393,
  author="Tomáš {Mikolov}",
  title="Language models for automatic speech recognition of Czech lectures",
  booktitle="Proc. STUDENT EEICT 2008",
  year="2008",
  pages="1--5",
  publisher="Faculty of Electrical Engineering and Communication BUT",
  address="Brno",
  isbn="978-80-214-3617-6",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/mikolov_eeict2008.pdf"
}

Projects

Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)