Result Details

Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets

KIŠŠ, M.; HRADIŠ, M. Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets. In Document Analysis and Recognition – ICDAR 2025 Workshops. Cham: Springer Nature Switzerland, 2025. p. 53-70. ISBN: 978-3-032-09367-7.
Type
conference paper
Language
English
Authors
Kišš Martin, Ing., DCGM (FIT)
Hradiš Michal, Ing., Ph.D., UAMT (FEEC), DCGM (FIT)
Abstract

Self-supervised learning has emerged as a powerful approach for leveraging large-scale unlabeled data to improve model performance in various domains. In this paper, we explore masked self-supervised pre-training for text recognition transformers. Specifically, we propose two modifications to the pre-training phase: progressively increasing the masking probability, and modifying the loss function to incorporate both masked and non-masked patches. We conduct extensive experiments using a dataset of 50M unlabeled text lines for pre-training and four differently sized annotated datasets for fine-tuning. Furthermore, we compare our pre-trained models against those trained with transfer learning, demonstrating the effectiveness of the self-supervised pre-training. In particular, pre-training consistently improves the character error rate of models, in some cases up to 30 % relatively. It is also on par with transfer learning but without relying on extra annotated text lines.

Keywords

Self-supervised pre-training; Transformers; OCR; HTR

URL
Published
2025
Pages
53–70
Proceedings
Document Analysis and Recognition – ICDAR 2025 Workshops
Conference
International Conference on Document Analysis and Recognition
ISBN
978-3-032-09367-7
Publisher
Springer Nature Switzerland
Place
Cham
DOI
EID Scopus
BibTeX
@inproceedings{BUT197661,
  author="Martin {Kišš} and Michal {Hradiš} and  {}",
  title="Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets",
  booktitle="Document Analysis and Recognition – ICDAR 2025 Workshops",
  year="2025",
  pages="53--70",
  publisher="Springer Nature Switzerland",
  address="Cham",
  doi="10.1007/978-3-032-09368-4\{_}4",
  isbn="978-3-032-09367-7",
  url="https://link.springer.com/chapter/10.1007/978-3-032-09368-4_4"
}
Projects
semANT - Semantic Document Exploration, MK, NAKI III – program na podporu aplikovaného výzkumu v oblasti národní a kulturní identity na léta 2023 až 2030, DH23P03OVV060, start: 2023-03-01, end: 2027-12-31, running
Departments
Back to top