Publication Details

Isomorphic mapping of DOM trees for Cluster-Based Page Segmentation

ZELENÝ Jan and BURGET Radek. Isomorphic mapping of DOM trees for Cluster-Based Page Segmentation. In: Proceedings of the Twelfth International Conference on Informatics INFORMATICS'2013. Spišská Nová Ves: The University of Technology Košice, 2013, pp. 256-261. ISBN 978-80-8143-127-2.
Czech title
Izomorfní mapování DOM stromů pro segmentaci webových stránek
Type
conference paper
Language
english
Authors
Zelený Jan, Ing., Ph.D. (DIFS FIT BUT)
Burget Radek, Ing., Ph.D. (DIFS FIT BUT)
Keywords
vision-based page segmentation, cache, template detection, cluster-based page segmentation, DOM, tree mapping
Abstract
In our previous work we have designed a method for fast and precise Web page segmentation. In this paper we propose a complementary algorithm and data structures that extend the original design. The extension is focused on isomorphic mapping between two DOM trees. Our main objective is to improve robustness of our original solution. We successfully design and implement a solution that is more robust while keeping the efficiency of the original simple one. To prove qualities of our new design we also offer an experimental evaluation of the new implementation.
Published
2013
Pages
256-261
Proceedings
Proceedings of the Twelfth International Conference on Informatics INFORMATICS'2013
Conference
Informatics 2013 - 12th International Scientific Conference on Informatics, Spišská Nová Ves, SK
ISBN
978-80-8143-127-2
Publisher
The University of Technology Košice
Place
Spišská Nová Ves, SK
BibTeX
@INPROCEEDINGS{FITPUB10414,
   author = "Jan Zelen\'{y} and Radek Burget",
   title = "Isomorphic mapping of DOM trees for Cluster-Based Page Segmentation",
   pages = "256--261",
   booktitle = "Proceedings of the Twelfth International Conference on Informatics INFORMATICS'2013",
   year = 2013,
   location = "Spi\v{s}sk\'{a} Nov\'{a} Ves, SK",
   publisher = "The University of Technology Ko\v{s}ice",
   ISBN = "978-80-8143-127-2",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10414"
}
Back to top