Thesis Details

Analýza rozložení textu v historických dokumentech

Master's Thesis Student: Palacková Bianca Academic Year: 2020/2021 Supervisor: Kodym Oldřich, Ing., Ph.D.

English title

Text Layout Analysis in Historical Documents

Language

Czech

Abstract

The goal of this thesis is to design and implement algorithm for text layout analysis in historical documents. Neural network was used to solve this problem, specifically architecture Faster-RCNN. Dataset of 6 135 images with historical newspaper was used for training and testing. For purpose of the thesis four models of neural networks were trained: model for detection of words, headings, text regions and model for words detection based on position in line. Outputs from these models were processed in order to determine text layout in input image. A modified F-score metric was used for the evaluation. Based on this metric, the algorithm reached an accuracy almost 80 %.

Keywords

document layout analysis, neural networks, Faster-RCNN, Python, image processing

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology and Artificial Intelligence, Specialization Computer Vision

Files

Status

defended, grade B

Date

24 June 2021

Reviewer

Hradiš Michal, Ing., Ph.D.

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Bařina David, Ing., Ph.D. (DCGM FIT BUT), člen
Beran Vítězslav, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Herout Adam, prof. Ing., Ph.D. (DCGM FIT BUT), člen
Lengál Ondřej, Ing., Ph.D. (DITS FIT BUT), člen
Zemčík Pavel, prof. Dr. Ing. (DCGM FIT BUT), člen

Citation

PALACKOVÁ, Bianca. Analýza rozložení textu v historických dokumentech. Brno, 2021. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-24. Supervised by Kodym Oldřich. Available from: https://www.fit.vut.cz/study/thesis/23653/

BibTeX

@mastersthesis{FITMT23653,
    author = "Bianca Palackov\'{a}",
    type = "Master's thesis",
    title = "Anal\'{y}za rozlo\v{z}en\'{i} textu v historick\'{y}ch dokumentech",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/23653/"
}

Theses