Thesis Details

Aktivní učení pro rozpoznávání textu

Master's Thesis Student: Kohút Jan Academic Year: 2018/2019 Supervisor: Hradiš Michal, Ing., Ph.D.
English title
Active Learning for OCR
Language
Czech
Abstract

The aim of this Master's thesis is to design methods of active learning and to experiment with datasets of historical documents. A large and diverse dataset IMPACT of more than one million lines is used for experiments. I am using neural networks to check the readability of lines and correctness of their annotations. Firstly, I compare architectures of convolutional and recurrent neural networks with bidirectional LSTM layer. Next, I study different ways of learning neural networks using methods of active learning. Mainly I use active learning to adapt neural networks to documents that the neural networks do not have in the original training dataset. Active learning is thus used for picking appropriate adaptation data. Convolutional neural networks achieve 98.6\% accuracy, recurrent neural networks achieve 99.5\% accuracy. Active learning decreases error by 26\% compared to random pick of adaptations data.

Keywords

Active learning, text recognition, neural networks, convolutional neural networks, recurrent neural networks, dataset IMPACT

Department
Degree Programme
Information Technology, Field of Study Intelligent Systems
Files
Status
defended, grade A
Date
19 June 2019
Reviewer
Committee
Zbořil František V., doc. Ing., CSc. (DITS FIT BUT), předseda
Beran Vítězslav, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Horák Aleš, doc. RNDr., Ph.D. (FI MUNI), člen
Hrubý Martin, Ing., Ph.D. (DITS FIT BUT), člen
Janoušek Vladimír, doc. Ing., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Citation
KOHÚT, Jan. Aktivní učení pro rozpoznávání textu. Brno, 2019. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-19. Supervised by Hradiš Michal. Available from: https://www.fit.vut.cz/study/thesis/22021/
BibTeX
@mastersthesis{FITMT22021,
    author = "Jan Koh\'{u}t",
    type = "Master's thesis",
    title = "Aktivn\'{i} u\v{c}en\'{i} pro rozpozn\'{a}v\'{a}n\'{i} textu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/22021/"
}
Back to top