Thesis Details
Multi-modální přepis textu
The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.
automatic speech recognition, automatic text recognition, multimodal transcription, neural network, annotation server, text recognition, connection between handwriting text and speech, correction output of recognizers, multimodal system
Hradiš Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Janoušek Vladimír, doc. Ing., Ph.D. (DITS FIT BUT), člen
Kanich Ondřej, Ing., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen
@mastersthesis{FITMT24870, author = "Michal Kab\'{a}\v{c}", type = "Master's thesis", title = "Multi-mod\'{a}ln\'{i} p\v{r}epis textu", school = "Brno University of Technology, Faculty of Information Technology", year = 2022, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/24870/" }