Thesis Details

Active Learning pro zpracování archivních pramenů

Master's Thesis Student: Hříbek David Academic Year: 2020/2021 Supervisor: Rozman Jaroslav, Ing., Ph.D.
English title
Active Learning for Processing of Archive Sources
Language
Czech
Abstract

This work deals with the creation of a system that allows uploading and annotating scans of historical documents and subsequent active learning of models for character recognition (OCR) on available annotations (marked lines and their transcripts). The work describes the process, classifies the techniques and presents an existing system for character recognition. Above all, emphasis is placed on machine learning methods. Furthermore, the methods of active learning are explained and a method of active learning of available OCR models from annotated scans is proposed. The rest of the work deals with a system design, implementation, available datasets, evaluation of self-created OCR model and testing of the entire system.

Keywords

Machine learning, supervised learning, active learning, OCR, optical character recognition, active learning in handwritten text recognition, annotation of historical document scans.

Department
Degree Programme
Information Technology and Artificial Intelligence, Specialization Machine Learning
Files
Status
defended, grade A
Date
23 June 2021
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Bařina David, Ing., Ph.D. (DCGM FIT BUT), člen
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Čadík Martin, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Citation
HŘÍBEK, David. Active Learning pro zpracování archivních pramenů. Brno, 2021. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-23. Supervised by Rozman Jaroslav. Available from: https://www.fit.vut.cz/study/thesis/23784/
BibTeX
@mastersthesis{FITMT23784,
    author = "David H\v{r}\'{i}bek",
    type = "Master's thesis",
    title = "Active Learning pro zpracov\'{a}n\'{i} archivn\'{i}ch pramen\r{u}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/23784/"
}
Back to top