Thesis Details

Detekce vizuálních vzorů ve webových stránkách

Master's Thesis Student: Kotraš Martin Academic Year: 2021/2022 Supervisor: Burget Radek, doc. Ing., Ph.D.
English title
Visual Pattern Detection in Web Pages

The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view.As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.


information extraction, extractor, visual patterns, web pages, VizGet, FitLayout

Degree Programme
defended, grade A
21 June 2022
Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), předseda
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Hruška Tomáš, prof. Ing., CSc. (DIFS FIT BUT), člen
Hynek Jiří, Ing., Ph.D. (DIFS FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Vojnar Tomáš, prof. Ing., Ph.D. (DITS FIT BUT), člen
KOTRAŠ, Martin. Detekce vizuálních vzorů ve webových stránkách. Brno, 2022. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-21. Supervised by Burget Radek. Available from:
