Thesis Details

Modelování jazyka v rozpoznávání češtiny

Master's Thesis Student: Mikolov Tomáš Academic Year: 2006/2007 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.

English title

Language Modeling for Spech Recognition in Czech

Language

Czech

Abstract

This work concerns the problematic of language modeling in automatic speech recognition. Currently widely used techniques for advanced language modeling based on statistical approach are described in the first part of work - class based language models, factored language models and neural network based language models. In the next section, implementation of neural network based language model is described. Results obtained on "Pražský mluvený korpus" and "Brněnský mluvený korpus" corpora (1 170 000 words) are reported, with perplexity reduction around 20%. Also, results obtained after rescoring N-best lists with spontaneous speech are reported, with absolute improvement in accuracy by more than 1%. In the conclusion, possible uses of the work are mentioned, along with possible extensions in the future. Finally, main weaknesses of current statistical language modeling techniques are described.

Keywords

language modeling, Czech language, n-gram statistics, neural networks, speech recognition, artificial intelligence

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology, Field of Study Computer Graphics and Multimedia

Files

Thesis text 443 kB

Status

defended, grade A

Date

21 June 2007

Reviewer

Černocký Jan, prof. Dr. Ing.

Committee

Zemčík Pavel, prof. Dr. Ing. (DCGM FIT BUT), předseda
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Křena Bohuslav, Ing., Ph.D. (DITS FIT BUT), člen
Racek Stanislav, doc. Ing., CSc. (WBU in Pilsen), člen
Smrž Pavel, doc. RNDr., Ph.D. (DCGM FIT BUT), člen
Vojnar Tomáš, prof. Ing., Ph.D. (DITS FIT BUT), člen

Citation

MIKOLOV, Tomáš. Modelování jazyka v rozpoznávání češtiny. Brno, 2007. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2007-06-21. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/3645/

BibTeX

@mastersthesis{FITMT3645,
    author = "Tom\'{a}\v{s} Mikolov",
    type = "Master's thesis",
    title = "Modelov\'{a}n\'{i} jazyka v rozpozn\'{a}v\'{a}n\'{i} \v{c}e\v{s}tiny",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2007,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/3645/"
}

Theses