Thesis Details

Systém pro vyhledávání a výběry relevantních článků z Wikipedie podle tématu

Bachelor's Thesis Student: Suchý Ondřej Academic Year: 2014/2015 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.
English title
Wikipedia Page Classification
Language
Czech
Abstract

The goal of this paper is to design and implement a system for selection of Wikipedia articles relevant to a given topic in order to reduce the amount of memory taken by its offline version. The solution of this problem was achieved with use of methods from information retrieval and theirs implementation using Elasticsearch search engine. The system tries to determine the area of user's interest by given keywords and make a selection of articles from that area. This is achieved by measuring of similarity of articles and adding all articles from frequent categories in the selection. The sizes of the output files for queries over Simple English Wikipedia are usually below 30 MB.

Keywords

information retrieval, Wikipedia, Elasticsarch, document similarity, search engine

Department
Degree Programme
Information Technology
Files
Status
defended, grade C
Date
17 June 2015
Reviewer
Committee
Zendulka Jaroslav, doc. Ing., CSc. (DIFS FIT BUT), předseda
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Kočí Radek, Ing., Ph.D. (DITS FIT BUT), člen
Kotásek Zdeněk, doc. Ing., CSc. (DCSY FIT BUT), člen
Křivka Zbyněk, Ing., Ph.D. (DIFS FIT BUT), člen
Citation
SUCHÝ, Ondřej. Systém pro vyhledávání a výběry relevantních článků z Wikipedie podle tématu. Brno, 2015. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2015-06-17. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/17707/
BibTeX
@bachelorsthesis{FITBT17707,
    author = "Ond\v{r}ej Such\'{y}",
    type = "Bachelor's thesis",
    title = "Syst\'{e}m pro vyhled\'{a}v\'{a}n\'{i} a v\'{y}b\v{e}ry relevantn\'{i}ch \v{c}l\'{a}nk\r{u} z Wikipedie podle t\'{e}matu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2015,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/17707/"
}
Back to top