Detail výsledku

Measuring Web Page Similarity Based on Textual and Visual Properties

BARTÍK, V. Measuring Web Page Similarity Based on Textual and Visual Properties. In The 11th International Conference on Artificial Intelligence and Soft Computing. Lecture Notes in Computer Science. Lecture Notes in Artificial Intelligence, Vol. 7268. Zakopane: Springer Verlag, 2012. no. 7268, p. 13-21. ISBN: 978-3-642-29349-8. ISSN: 0302-9743.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Abstrakt

Measuring web page similarity is a very important task in the area of web mining and information retrieval. This paper introduces the method for measuring web page similarity, which considers both textual and visual properties of pages. Textual properties of a page are described by means of modified weight vector space model. General visual properties are captured via segmentation of a page, which divides a page into visual blocks, properties of which are stored into a vector of visual properties. These both vectors are then used to compute the whole web page similarity. This method will be described in detail and results of several experiments are also introduced in this paper.

Klíčová slova

Web page similarity, clustering, vector space model, vector distance, term weighting, visual blocks.

Rok
2012
Strany
13–21
Časopis
Lecture Notes in Computer Science, č. 7268, ISSN 0302-9743
Sborník
The 11th International Conference on Artificial Intelligence and Soft Computing
Řada
Lecture Notes in Artificial Intelligence, Vol. 7268
Konference
The 11th International Conference on Artificial Intelligence and Soft Computing
ISBN
978-3-642-29349-8
Vydavatel
Springer Verlag
Místo
Zakopane
UT WoS
000314151300002
BibTeX
@inproceedings{BUT76500,
  author="Vladimír {Bartík}",
  title="Measuring Web Page Similarity Based on Textual and Visual Properties",
  booktitle="The 11th International Conference on Artificial Intelligence and Soft Computing",
  year="2012",
  series="Lecture Notes in Artificial Intelligence, Vol. 7268",
  journal="Lecture Notes in Computer Science",
  number="7268",
  pages="13--21",
  publisher="Springer Verlag",
  address="Zakopane",
  isbn="978-3-642-29349-8",
  issn="0302-9743",
  url="https://www.fit.vut.cz/research/publication/9850/"
}
Soubory
Projekty
Výzkum informačních technologií z hlediska bezpečnosti, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, zahájení: 2007-01-01, ukončení: 2013-12-31, řešení
Pracoviště
Nahoru