Detail výsledku

Cluster-based Page Segmentation - a fast and precise method for web page pre-processing

ZELENÝ, J.; BURGET, R. Cluster-based Page Segmentation - a fast and precise method for web page pre-processing. In The Third International Conference on Web Intelligence, Mining and Semantics. Madrid: Association for Computing Machinery, 2013. p. 1-12. ISBN: 978-1-4503-1850-1.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Zelený Jan, Ing., Ph.D., UIFS (FIT)
Burget Radek, doc. Ing., Ph.D., UIFS (FIT)
Abstrakt

Segmenting a web page may be one of initial steps of information retrieval or content classification performed on that page. While there has been an extensive research in this area, the approaches usually focus either on performance or quality of the results. Vision based segmentation is one of the quality focused methods, which are considerably slow. This paper proposes an approach for boosting the performance of vision based algorithms. Our approach is based on concepts of modern web and a very common scenario in which an entire web site is processed at once. In this scenario, a great amount of performance boost can be gained by isomorphic mapping of previous results gathered from pages within the site to other pages on the same site. We provide the results of experiments performed on VIPS, the most common algorithm for page segmentation.

Klíčová slova

VIPS, vision-based page segmentation, clustering, template,\\template detection

Rok
2013
Strany
1–12
Sborník
The Third International Conference on Web Intelligence, Mining and Semantics
Konference
International Conference on Web Intelligence, Mining and Semantics
ISBN
978-1-4503-1850-1
Vydavatel
Association for Computing Machinery
Místo
Madrid
DOI
EID Scopus
BibTeX
@inproceedings{BUT106483,
  author="Jan {Zelený} and Radek {Burget}",
  title="Cluster-based Page Segmentation - a fast and precise method for web page pre-processing",
  booktitle="The Third International Conference on Web Intelligence, Mining and Semantics",
  year="2013",
  pages="1--12",
  publisher="Association for Computing Machinery",
  address="Madrid",
  doi="10.1145/2479787.2479792",
  isbn="978-1-4503-1850-1",
  url="https://www.fit.vut.cz/research/publication/10252/"
}
Soubory
Projekty
Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, zahájení: 2011-01-01, ukončení: 2015-12-31, ukončen
Pokročilé rozpoznávání a prezentace multimediálních dat, VUT, Vnitřní projekty VUT, FIT-S-11-2, zahájení: 2011-01-01, ukončení: 2013-12-31, ukončen
Výzkum informačních technologií z hlediska bezpečnosti, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, zahájení: 2007-01-01, ukončení: 2013-12-31, řešení
Výzkumné skupiny
Pracoviště
Nahoru