Thesis Details

Multi-modální přepis textu

Master's Thesis Student: Kabáč Michal Academic Year: 2021/2022 Supervisor: Kišš Martin, Ing.
English title
Multi-Modal Text Recognition

The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.


automatic speech recognition, automatic text recognition, multimodal transcription, neural network, annotation server, text recognition, connection between handwriting text and speech, correction output of recognizers, multimodal system

Degree Programme
defended, grade C
17 June 2022
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Hradiš Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Janoušek Vladimír, doc. Ing., Ph.D. (DITS FIT BUT), člen
Kanich Ondřej, Ing., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen
KABÁČ, Michal. Multi-modální přepis textu. Brno, 2022. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-17. Supervised by Kišš Martin. Available from:
    author = "Michal Kab\'{a}\v{c}",
    type = "Master's thesis",
    title = "Multi-mod\'{a}ln\'{i} p\v{r}epis textu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "czech",
    url = ""
Back to top