Detail výsledku

HTML Document Analysis for Information Extraction

BURGET, R. HTML Document Analysis for Information Extraction. Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002. p. 426-430. ISBN: 80-214-2116-9.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Burget Radek, doc. Ing., Ph.D.

Abstrakt

The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Klíčová slova

HTML Analysis, Information Extraction

Rok

2002

Strany

426–430

Sborník

Proceedings of 8th EEICT conference

Konference

Student EEICT 2002

ISBN

80-214-2116-9

Vydavatel

Faculty of Information Technology BUT

Místo

Brno

BibTeX

@inproceedings{BUT10014,
  author="Radek {Burget}",
  title="HTML Document Analysis for Information Extraction",
  booktitle="Proceedings of 8th EEICT conference",
  year="2002",
  pages="426--430",
  publisher="Faculty of Information Technology BUT",
  address="Brno",
  isbn="80-214-2116-9"
}

Výzkumné skupiny

Výzkumná skupina informačních a databázových systémů (VZ IS)

Pracoviště

Fakulta informačních technologií (FIT)
Ústav informačních systémů (UIFS)