Publication Details
Orbis Pictus: Zpřístupnění netextových dat z digitálních knihoven
Jebavý Filip, Mgr.
Kersch Filip, Mgr.
Pavčík Filip, Mgr., Ph.D.
Jana Hrzinová, Mgr.
Fremrová Květa
Kišš Martin, Ing. (DCGM)
Lhoták Martin, Ing.
Dvořáková Martina
Bežová Michaela, Mgr. et Bc.
Hradiš Michal, Ing., Ph.D. (DCGM)
Žabička Petr, Ing.
Jiroušek Václav
digital libraries;machine learning;image recognition;image retrieval;creative
industries
Purpose - The project "Book Revival for Cultural and Creative Sectors" aims to
make the non-textual content of Czech digital libraries easily available, since
it is now difficult to access and search compared to textual data. This article
provides an overview of the planned outputs of the project, with an emphasis on
the key results achieved in the first two years. Method - Accessing non-textual
objects in digitized documents can be divided into three tasks: detection,
description and retrieval. The identification, localization and categorization of
objects will be provided by AnnoPage. This tool will allow extracting object
descriptions and storing them in a standardized format. In the next phases of the
project, AnnoPage will be followed by PeopleGator, which identifies people in
photographs or drawings and allows linking documents depicting the same person
and creating a database of identified people. At the project's conclusion,
a software solution integrating all the developed tools will be provided. Results
- In the first two years of the project, a methodology for processing image
documents was developed. This methodology describes how to detect non-text
objects, classify them into 25 categories and store this information using
international standards, thus laying the foundation for the AnnoPage tool.
A detector trained on a custom dataset is used to detect the objects. Detected
objects are described using vector representations and textual descriptions.
Originality/value - The outputs of the project will be integrated into the Czech
Digital Library, which will enable a wide range of libraries aggregated by the
platform to use the developed tools. Orbis Pictus is a unique project in the
field of digital humanities due to its extensive collection of non-textual data.
The results will find applications not only in object and metadata
identification, but also in research and the cultural and creative industries,
where the detected objects can serve as inspiration for marketing, education,
gamification or artificial intelligence.
@article{BUT197967,
author="Dalibor {Lehečka} and Filip {Jebavý} and Filip {Kersch} and Filip {Pavčík} and Hrzinová {Jana} and Květa {Fremrová} and Martin {Kišš} and Martin {Lhoták} and Martina {Dvořáková} and Michaela {Bežová} and Michal {Hradiš} and Petr {Žabička} and Václav {Jiroušek}",
title="Orbis Pictus: Zpřístupnění netextových dat z digitálních knihoven",
journal="ITlib",
year="2024",
volume="2024",
number="2",
pages="22--31",
doi="10.52036/1335793X.2024.2.22-31",
issn="1336-0779",
url="https://doi.org/10.52036/1335793X.2024.2.22-31"
}