Result Details

HTML Document Analysis for Information Extraction

BURGET, R. HTML Document Analysis for Information Extraction. Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002. p. 426-430. ISBN: 80-214-2116-9.

Type

conference paper

Language

English

Authors

Burget Radek, doc. Ing., Ph.D., FIT (FIT)

Abstract

The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Keywords

HTML Analysis, Information Extraction

Published

2002

Pages

426–430

Proceedings

Proceedings of 8th EEICT conference

Conference

Student EEICT 2002

ISBN

80-214-2116-9

Publisher

Faculty of Information Technology BUT

Place

Brno

BibTeX

@inproceedings{BUT10014,
  author="Radek {Burget}",
  title="HTML Document Analysis for Information Extraction",
  booktitle="Proceedings of 8th EEICT conference",
  year="2002",
  pages="426--430",
  publisher="Faculty of Information Technology BUT",
  address="Brno",
  isbn="80-214-2116-9"
}

Research groups

Výzkumná skupina informačních a databázových systémů (RG IS)

Departments

Fakulta informačních technologií (FIT)
Ústav informačních systémů (DIFS)