Thesis Details
Segmentace stránky ve webovém prohlížeči
This thesis deals with the web page segmentation in a web browser. The implementation of Box Clustering Segmentation (BCS) method in JavaScript using an automated browser was created. The actual implementation consists of two main steps, which are the box extraction (leaf DOM nodes) from the browser context and their subsequent clustering based on the similarity model defined in BCS. Main result of this thesis is a functional implementation of BCS method usable for web page segmentation. The evaluation of the functionality and accuracy of the implementation is based on a comparison with a reference implementation created in Java.
web page segmentation, Box Clustering Segmentation algorithm, BCS, clustering, similarity model, browser automation, Playwright
Burget Radek, doc. Ing., Ph.D. (DIFS FIT BUT), člen
Rogalewicz Adam, doc. Mgr., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen
@mastersthesis{FITMT23534, author = "Tom\'{a}\v{s} Zubrik", type = "Master's thesis", title = "Segmentace str\'{a}nky ve webov\'{e}m prohl\'{i}\v{z}e\v{c}i", school = "Brno University of Technology, Faculty of Information Technology", year = 2021, location = "Brno, CZ", language = "slovak", url = "https://www.fit.vut.cz/study/thesis/23534/" }