Thesis Details
Active Learning pro zpracování archivních pramenů
This work deals with the creation of a system that allows uploading and annotating scans of historical documents and subsequent active learning of models for character recognition (OCR) on available annotations (marked lines and their transcripts). The work describes the process, classifies the techniques and presents an existing system for character recognition. Above all, emphasis is placed on machine learning methods. Furthermore, the methods of active learning are explained and a method of active learning of available OCR models from annotated scans is proposed. The rest of the work deals with a system design, implementation, available datasets, evaluation of self-created OCR model and testing of the entire system.
Machine learning, supervised learning, active learning, OCR, optical character recognition, active learning in handwritten text recognition, annotation of historical document scans.
Bařina David, Ing., Ph.D. (DCGM FIT BUT), člen
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Čadík Martin, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
@mastersthesis{FITMT23784, author = "David H\v{r}\'{i}bek", type = "Master's thesis", title = "Active Learning pro zpracov\'{a}n\'{i} archivn\'{i}ch pramen\r{u}", school = "Brno University of Technology, Faculty of Information Technology", year = 2021, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/23784/" }