Thesis Details

Shlukování slov podle významu

Bachelor's Thesis Student: Jankech Marek Academic Year: 2018/2019 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.
English title
Word Sense Clustering
Language
Czech
Abstract

Semantic similarity of words can be encoded using vector representation - word embedding. Known representatives of model types that produce these embeddings are Word2Vec, FastText and Glove. In this thesis, a newer type of model named Dict2Vec is introduced. It is a Word2Vec extension that leverages lexical dictionaries. The thesis describes the preparation of data from various corpus and dictionary sources and compares accuracy of each model type. It also introduces a web application that uses word embedding.

Keywords

corpus, dictionary, definitions, lemmatization, Dict2Vec, Word2Vec, FastText, Glove, natural language processing

Department
Degree Programme
Information Technology
Files
Status
defended, grade E
Date
10 June 2019
Reviewer
Committee
Smrž Pavel, doc. RNDr., Ph.D. (DCGM FIT BUT), předseda
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Holík Lukáš, doc. Mgr., Ph.D. (DITS FIT BUT), člen
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Citation
JANKECH, Marek. Shlukování slov podle významu. Brno, 2019. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-10. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/21483/
BibTeX
@bachelorsthesis{FITBT21483,
    author = "Marek Jankech",
    type = "Bachelor's thesis",
    title = "Shlukov\'{a}n\'{i} slov podle v\'{y}znamu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/21483/"
}
Back to top