Multi-modální přepis textu

Master's Thesis Student: Kabáč Michal Academic Year: 2021/2022 Supervisor: Kišš Martin, Ing.
English title
Multi-Modal Text Recognition

The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.


automatic speech recognition, automatic text recognition, multimodal transcription, neural network, annotation server, text recognition, connection between handwriting text and speech, correction output of recognizers, multimodal system

Degree Programme
17 June 2022
