Thesis Details

Segmentace stránky ve webovém prohlížeči

Master's Thesis Student: Zubrik Tomáš Academic Year: 2020/2021 Supervisor: Burget Radek, doc. Ing., Ph.D.
Language
Slovak
Abstract

This thesis deals with the web page segmentation in a web browser. The implementation of Box Clustering Segmentation (BCS) method in JavaScript using an automated browser was created. The actual implementation consists of two main steps, which are the box extraction (leaf DOM nodes) from the browser context and their subsequent clustering based on the similarity model defined in BCS. Main result of this thesis is a functional implementation of BCS method usable for web page segmentation. The evaluation of the functionality and accuracy of the implementation is based on a comparison with a reference implementation created in Java.

Keywords

web page segmentation, Box Clustering Segmentation algorithm, BCS, clustering, similarity model, browser automation, Playwright

Department
Degree Programme
Information Technology and Artificial Intelligence, Specialization Information Systems and Databases
Files
Status
defended, grade A
Date
23 June 2021
Reviewer
Committee
Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), předseda
Burget Radek, doc. Ing., Ph.D. (DIFS FIT BUT), člen
Rogalewicz Adam, doc. Mgr., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen
Citation
ZUBRIK, Tomáš. Segmentace stránky ve webovém prohlížeči. Brno, 2021. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-23. Supervised by Burget Radek. Available from: https://www.fit.vut.cz/study/thesis/23534/
BibTeX
@mastersthesis{FITMT23534,
    author = "Tom\'{a}\v{s} Zubrik",
    type = "Master's thesis",
    title = "Segmentace str\'{a}nky ve webov\'{e}m prohl\'{i}\v{z}e\v{c}i",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "slovak",
    url = "https://www.fit.vut.cz/study/thesis/23534/"
}
Back to top