Thesis Details

Extrakce informací z webových stránek

Master's Thesis Student: Bukovčák Jakub Academic Year: 2018/2019 Supervisor: Burget Radek, doc. Ing., Ph.D.

English title

Information Extraction from Web Pages

Language

Czech

Abstract

This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.

Keywords

HLRT wrapper, information extraction from HTML, Java EE, Web Crawling, downloading HTML documents

Department

Department of Information Systems FIT BUT

Degree Programme

Information Technology, Field of Study Information Systems

Files

Status

defended, grade B

Date

20 June 2019

Reviewer

Rychlý Marek, RNDr., Ph.D.

Committee

Hruška Tomáš, prof. Ing., CSc. (DIFS FIT BUT), předseda
Janoušek Vladimír, doc. Ing., Ph.D. (DITS FIT BUT), člen
Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), člen
Malinka Kamil, Mgr., Ph.D. (DITS FIT BUT), člen
Rybička Jiří, doc. Ing. Dr. (Mendelu), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen

Citation

BUKOVČÁK, Jakub. Extrakce informací z webových stránek. Brno, 2019. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-20. Supervised by Burget Radek. Available from: https://www.fit.vut.cz/study/thesis/21836/

BibTeX

@mastersthesis{FITMT21836,
    author = "Jakub Bukov\v{c}\'{a}k",
    type = "Master's thesis",
    title = "Extrakce informac\'{i} z webov\'{y}ch str\'{a}nek",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/21836/"
}

Theses