Detail výsledku

Brno Mobile OCR Dataset

KIŠŠ, M.; HRADIŠ, M.; KODYM, O. Brno Mobile OCR Dataset. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. Sydney: Institute of Electrical and Electronics Engineers, 2020. p. 1352-1357. ISBN: 978-1-7281-3015-6.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Kišš Martin, Ing., UPGM (FIT)
Hradiš Michal, Ing., Ph.D., UAMT (FEKT), UPGM (FIT)
Kodym Oldřich, Ing., Ph.D., UPGM (FIT)

Abstrakt

We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Character Recognition from low-quality images captured by handheld mobile devices. While OCR of high-quality scanned documents is a mature field where many commercial tools are available, and large datasets of text in the wild exist, no existing datasets can be used to develop and test document OCR methods robust to non-uniform lighting, image blur, strong noise, built-in denoising, sharpening, compression and other artifacts present in many photographs from mobile devices.

This dataset contains 2 113 unique pages from random scientific papers, which were photographed by multiple people using 23 different mobile devices. The resulting 19 728 photographs of various visual quality are accompanied by precise positions and text annotations of 500k text lines. We further provide an evaluation methodology, including an evaluation server and a testset with non-public annotations.

We provide a state-of-the-art text recognition baseline build on convolutional and recurrent neural networks trained with Connectionist Temporal Classification loss. This baseline achieves 2 %, 23 % and 73 % word error rates on easy, medium and hard parts of the dataset, respectively, confirming that the dataset is challenging.

The presented dataset will enable future development and evaluation of document analysis for low-quality images. It is primarily intended for line-level text recognition, and can be further used for line localization, layout analysis, image restoration and text binarization.

Klíčová slova

OCR, CTC, mobile, dataset

URL

https://pero.fit.vutbr.cz/publications

Rok

2020

Strany

1352–1357

Sborník

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR

Konference

International Conference on Document Analysis and Recognition

ISBN

978-1-7281-3015-6

Vydavatel

Institute of Electrical and Electronics Engineers

Místo

Sydney

DOI

10.1109/ICDAR.2019.00218

EID Scopus

2-s2.0-85079905359

BibTeX

@inproceedings{BUT162131,
  author="Martin {Kišš} and Michal {Hradiš} and Oldřich {Kodym}",
  title="Brno Mobile OCR Dataset",
  booktitle="Proceedings of the International Conference on Document Analysis and Recognition, ICDAR",
  year="2020",
  pages="1352--1357",
  publisher="Institute of Electrical and Electronics Engineers",
  address="Sydney",
  doi="10.1109/ICDAR.2019.00218",
  isbn="978-1-7281-3015-6",
  url="https://pero.fit.vutbr.cz/publications"
}

Soubory

pdf Brno Mobile OCR Dataset.pdf 4 MB

Projekty

Pokročilá extrakce a rozpoznávání obsahu tištěných a rukou psaných digitalizátů pro zvýšení jejich přístupnosti a využitelnosti, MK, Program na podporu aplikovaného výzkumu a experimentálního vývoje národní a kulturní identity na léta 2016 až 2022 (NAKI II), DG18P02OVV055, zahájení: 2018-03-01, ukončení: 2022-12-31, ukončen
Zpracování, zobrazování a analýza multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-17-3984, zahájení: 2017-03-01, ukončení: 2020-02-29, ukončen

Výzkumné skupiny

Výzkumná skupina počítačové grafiky (VZ GRAPH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)