Thesis Details
Detekce vizuálních vzorů ve webových stránkách
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view.As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
information extraction, extractor, visual patterns, web pages, VizGet, FitLayout
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Hruška Tomáš, prof. Ing., CSc. (DIFS FIT BUT), člen
Hynek Jiří, Ing., Ph.D. (DIFS FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Vojnar Tomáš, prof. Ing., Ph.D. (DITS FIT BUT), člen
@mastersthesis{FITMT24460, author = "Martin Kotra\v{s}", type = "Master's thesis", title = "Detekce vizu\'{a}ln\'{i}ch vzor\r{u} ve webov\'{y}ch str\'{a}nk\'{a}ch", school = "Brno University of Technology, Faculty of Information Technology", year = 2022, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/24460/" }