Thesis Details
Implementace algoritmu pro vizuální segmentaci www stránek
Segmentation of WWW pages or page division on di erent semantics blocks is one of the disciplines of information extraction. Master's thesis deals with Vision-based Page Segmentation - VIPS method, which consist in division based on visual properties of page's elements. The method is given in context of other prominent segmentation procedures. In this work, the key steps, that this method consist of are shown and described on examples. For VIPS method it is necessary to cooperate with WWW pages rendering engine in order to obtain Document Object Model of page. The paper presents and describes four most important engines for Java programming language. The output of this work is implementation of VIPS algorithm just in Java language with usage of CSSBox core. The original algorithm implementation from Microsoft's labs is presented. The di erent development stages of library implementing VIPS method and my approach to it's solution are described. In the end of this work the work's outcome is demonstrated on several pages segmentation.
Vision-based Page Segmentation, Java, Linux, WWW, Segmentation, CSSBox, Document Object Model
Hanáček Petr, doc. Dr. Ing. (DITS FIT BUT), člen
Kreslíková Jitka, doc. RNDr., CSc. (DIFS FIT BUT), člen
Křivka Zbyněk, Ing., Ph.D. (DIFS FIT BUT), člen
Návrat Pavol, prof. Ing., Ph.D. (FIIT STU), člen
Zbořil František, doc. Ing., Ph.D. (DITS FIT BUT), člen
@mastersthesis{FITMT14163,
author = "Tom\'{a}\v{s} Popela",
type = "Master's thesis",
title = "Implementace algoritmu pro vizu\'{a}ln\'{i} segmentaci www str\'{a}nek",
school = "Brno University of Technology, Faculty of Information Technology",
year = 2012,
location = "Brno, CZ",
language = "czech",
url = "https://www.fit.vut.cz/study/thesis/14163/"
}