Publication Details
Orbis Pictus: Zpřístupnění netextových dat z digitálních knihoven
Jebavý Filip, Mgr.
Kersch Filip, Mgr.
Pavčík Filip, Mgr., Ph.D.
Jana Hrzinová, Mgr.
Fremrová Květa
Kišš Martin, Ing. (DCGM)
Lhoták Martin, Ing.
Dvořáková Martina
Bežová Michaela, Mgr. et Bc.
Hradiš Michal, Ing., Ph.D. (DCGM)
Žabička Petr, Ing.
Jiroušek Václav
digital libraries;machine learning;image recognition;image retrieval;creative
industries
Purpose - The project "Book Revival for Cultural and Creative Sectors" aims to make the non-textual content of Czech digital libraries easily available, since it is now difficult to access and search compared to textual data. This article provides an overview of the planned outputs of the project, with an emphasis on the key results achieved in the first two years. Method - Accessing non-textual objects in digitized documents can be divided into three tasks: detection, description and retrieval. The identification, localization and categorization of objects will be provided by AnnoPage. This tool will allow extracting object descriptions and storing them in a standardized format. In the next phases of the project, AnnoPage will be followed by PeopleGator, which identifies people in photographs or drawings and allows linking documents depicting the same person and creating a database of identified people. At the project's conclusion, a software solution integrating all the developed tools will be provided. Results - In the first two years of the project, a methodology for processing image documents was developed. This methodology describes how to detect non-text objects, classify them into 25 categories and store this information using international standards, thus laying the foundation for the AnnoPage tool. A detector trained on a custom dataset is used to detect the objects. Detected objects are described using vector representations and textual descriptions. Originality/value - The outputs of the project will be integrated into the Czech Digital Library, which will enable a wide range of libraries aggregated by the platform to use the developed tools. Orbis Pictus is a unique project in the field of digital humanities due to its extensive collection of non-textual data. The results will find applications not only in object and metadata identification, but also in research and the cultural and creative industries, where the detected objects can serve as inspiration for marketing, education, gamification or artificial intelligence.
@article{BUT197967,
author="Dalibor {Lehečka} and Filip {Jebavý} and Filip {Kersch} and Filip {Pavčík} and Hrzinová {Jana} and Květa {Fremrová} and Martin {Kišš} and Martin {Lhoták} and Martina {Dvořáková} and Michaela {Bežová} and Michal {Hradiš} and Petr {Žabička} and Václav {Jiroušek}",
title="Orbis Pictus: Zpřístupnění netextových dat z digitálních knihoven",
journal="ITlib",
year="2024",
volume="2024",
number="2",
pages="22--31",
doi="10.52036/1335793X.2024.2.22-31",
issn="1336-0779",
url="https://doi.org/10.52036/1335793X.2024.2.22-31"
}