Thesis Details
Systém pro vyhledávání a výběry relevantních článků z Wikipedie podle tématu
The goal of this paper is to design and implement a system for selection of Wikipedia articles relevant to a given topic in order to reduce the amount of memory taken by its offline version. The solution of this problem was achieved with use of methods from information retrieval and theirs implementation using Elasticsearch search engine. The system tries to determine the area of user's interest by given keywords and make a selection of articles from that area. This is achieved by measuring of similarity of articles and adding all articles from frequent categories in the selection. The sizes of the output files for queries over Simple English Wikipedia are usually below 30 MB.
information retrieval, Wikipedia, Elasticsarch, document similarity, search engine
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Kočí Radek, Ing., Ph.D. (DITS FIT BUT), člen
Kotásek Zdeněk, doc. Ing., CSc. (DCSY FIT BUT), člen
Křivka Zbyněk, Ing., Ph.D. (DIFS FIT BUT), člen
@bachelorsthesis{FITBT17707, author = "Ond\v{r}ej Such\'{y}", type = "Bachelor's thesis", title = "Syst\'{e}m pro vyhled\'{a}v\'{a}n\'{i} a v\'{y}b\v{e}ry relevantn\'{i}ch \v{c}l\'{a}nk\r{u} z Wikipedie podle t\'{e}matu", school = "Brno University of Technology, Faculty of Information Technology", year = 2015, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/17707/" }