Thesis Details

Rozpoznávání historických textů pomocí hlubokých neuronových sítí

Bachelor's Thesis Student: Vešelíny Peter Academic Year: 2018/2019 Supervisor: Kišš Martin, Ing.
English title
Convolutional Networks for Historic Text Recognition
Language
Czech
Abstract

This thesis deals with text line recognition of historical documents. Historical texts dating back to the 17th - 19th centuries are written in fraktur typeface. The character recognition problem is solved using neural network architecture called sequence-to-sequence. This architecture is based on encoder-decoder model and contains attention mechanism. In this thesis a dataset, from texts originated from German archiv called Deutsches Textarchiv, was created. This archive contains 3 897 different German books that have available transcripts and corresponding images of pages. The created dataset was used to train and experiment withthe proposed neural network. During the experiments, several convolutional models, hyperparameters and the effects of positional embedding were investigated. The final tool can recognize characters with accuracy 99,63 %. The contribution of this work is the~mentioned dataset and neural network, which can be used to recognize historical documents.

Keywords

text recognition, historical text, neural network, OCR, convolutional neural network, CNN, recurrent neural network, RNN, seq2seq, encoder, decoder, attention

Department
Degree Programme
Information Technology
Files
Status
defended, grade A
Date
13 June 2019
Reviewer
Committee
Herout Adam, prof. Ing., Ph.D. (DCGM FIT BUT), předseda
Drábek Vladimír, doc. Ing., CSc. (DCSY FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Španěl Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Citation
VEŠELÍNY, Peter. Rozpoznávání historických textů pomocí hlubokých neuronových sítí. Brno, 2019. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-13. Supervised by Kišš Martin. Available from: https://www.fit.vut.cz/study/thesis/21411/
BibTeX
@bachelorsthesis{FITBT21411,
    author = "Peter Ve\v{s}el\'{i}ny",
    type = "Bachelor's thesis",
    title = "Rozpozn\'{a}v\'{a}n\'{i} historick\'{y}ch text\r{u} pomoc\'{i} hlubok\'{y}ch neuronov\'{y}ch s\'{i}t\'{i}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/21411/"
}
Back to top