Detail výsledku

HTML Document Analysis for Information Extraction

BURGET, R. HTML Document Analysis for Information Extraction. Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002. p. 426-430. ISBN: 80-214-2116-9.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Abstrakt

The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Klíčová slova

HTML Analysis, Information Extraction

Rok
2002
Strany
426–430
Sborník
Proceedings of 8th EEICT conference
Konference
Student EEICT 2002
ISBN
80-214-2116-9
Vydavatel
Faculty of Information Technology BUT
Místo
Brno
BibTeX
@inproceedings{BUT10014,
  author="Radek {Burget}",
  title="HTML Document Analysis for Information Extraction",
  booktitle="Proceedings of 8th EEICT conference",
  year="2002",
  pages="426--430",
  publisher="Faculty of Information Technology BUT",
  address="Brno",
  isbn="80-214-2116-9"
}
Výzkumné skupiny
Pracoviště
Nahoru