Result Details
HTML Document Analysis for Information Extraction
BURGET, R. HTML Document Analysis for Information Extraction. Proceedings of 8th EEICT conference. Brno: Faculty of Information Technology BUT, 2002. p. 426-430. ISBN: 80-214-2116-9.
Type
conference paper
Language
English
Authors
Burget Radek, doc. Ing., Ph.D., FIT (FIT)
Abstract
The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.
Keywords
HTML Analysis, Information Extraction
Published
2002
Pages
426–430
Proceedings
Proceedings of 8th EEICT conference
Conference
Student EEICT 2002
ISBN
80-214-2116-9
Publisher
Faculty of Information Technology BUT
Place
Brno
BibTeX
@inproceedings{BUT10014,
author="Radek {Burget}",
title="HTML Document Analysis for Information Extraction",
booktitle="Proceedings of 8th EEICT conference",
year="2002",
pages="426--430",
publisher="Faculty of Information Technology BUT",
address="Brno",
isbn="80-214-2116-9"
}
Research groups
Departments