Thesis Details

Detekce vizuálních vzorů ve webových stránkách

Master's Thesis Student: Kotraš Martin Academic Year: 2021/2022 Supervisor: Burget Radek, doc. Ing., Ph.D.
English title
Visual Pattern Detection in Web Pages
Language
Czech
Abstract

The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view.As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.

Keywords

information extraction, extractor, visual patterns, web pages, VizGet, FitLayout

Department
Degree Programme
Information Technology and Artificial Intelligence, Specialization Information Systems and Databases
Files
Status
defended, grade A
Date
21 June 2022
Reviewer
Committee
Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), předseda
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Hruška Tomáš, prof. Ing., CSc. (DIFS FIT BUT), člen
Hynek Jiří, Ing., Ph.D. (DIFS FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Vojnar Tomáš, prof. Ing., Ph.D. (DITS FIT BUT), člen
Citation
KOTRAŠ, Martin. Detekce vizuálních vzorů ve webových stránkách. Brno, 2022. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-21. Supervised by Burget Radek. Available from: https://www.fit.vut.cz/study/thesis/24460/
BibTeX
@mastersthesis{FITMT24460,
    author = "Martin Kotra\v{s}",
    type = "Master's thesis",
    title = "Detekce vizu\'{a}ln\'{i}ch vzor\r{u} ve webov\'{y}ch str\'{a}nk\'{a}ch",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/24460/"
}
Back to top