Result Details

Accelerating the process of web page segmentation via template clustering

ZELENÝ, J.; BURGET, R. Accelerating the process of web page segmentation via template clustering. International Journal of Intelligent Information and Database System, 2016, vol. 2016, no. 2, p. 134-153. ISSN: 1751-5858.
Type
journal article
Language
English
Authors
Zelený Jan, Ing., Ph.D.
Burget Radek, doc. Ing., Ph.D., DIFS (FIT)
Abstract

Segmenting a web page is often one of the initial steps when performing some data mining on that page. We acknowledge that there is a lot of research in the area of segmentation based on visual perception of the web page. In this paper we propose a method how to improve the efficiency of virtually all vision-based segmentation algorithms. Our method, called Cluster-based Page Segmentation, takes the widely spread concept of web templates and utilizes it to improve the efficiency of vision-based page segmentation by clustering web pages and performing the segmentation on the cluster instead of on each page in that cluster. To prove the efficiency of our algorithm we offer experimental results gathered using three different vision-based segmentation algorithms.

Keywords

VIPS, page segmentation, vision-based page segmentation, web page segmentation, web page preprocessing, segmentation performance, clustering, template, template detection

Published
2016
Pages
134–153
Journal
International Journal of Intelligent Information and Database System, vol. 2016, no. 2, ISSN 1751-5858
DOI
EID Scopus
BibTeX
@article{BUT130902,
  author="Jan {Zelený} and Radek {Burget}",
  title="Accelerating the process of web page segmentation via template clustering",
  journal="International Journal of Intelligent Information and Database System",
  year="2016",
  volume="2016",
  number="2",
  pages="134--153",
  doi="10.1504/IJIIDS.2016.075424",
  issn="1751-5858",
  url="https://www.fit.vut.cz/research/publication/10530/"
}
Files
Projects
Advanced recognition and presentation of multimedia data, BUT, Vnitřní projekty VUT, FIT-S-11-2, start: 2011-01-01, end: 2013-12-31, completed
Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
Research groups
Departments
Back to top