Multi-modální přepis textu

English title

Multi-Modal Text Recognition

Language

Czech

Abstract

The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.

Keywords

automatic speech recognition, automatic text recognition, multimodal transcription, neural network, annotation server, text recognition, connection between handwriting text and speech, correction output of recognizers, multimodal system

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology and Artificial Intelligence, Specialization Machine Learning

Files

Status

defended, grade C

Date

17 June 2022

Reviewer

Herout Adam, prof. Ing., Ph.D.

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Hradiš Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Janoušek Vladimír, doc. Ing., Ph.D. (DITS FIT BUT), člen
Kanich Ondřej, Ing., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen

Citation

KABÁČ, Michal. Multi-modální přepis textu. Brno, 2022. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-17. Supervised by Kišš Martin. Available from: https://www.fit.vut.cz/study/thesis/24870/

BibTeX

@mastersthesis{FITMT24870,
    author = "Michal Kab\'{a}\v{c}",
    type = "Master's thesis",
    title = "Multi-mod\'{a}ln\'{i} p\v{r}epis textu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/24870/"
}