Thesis Details

Shlukování slov podle významu

Bachelor's Thesis Student: Jankech Marek Academic Year: 2018/2019 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.

English title

Word Sense Clustering

Language

Czech

Abstract

Semantic similarity of words can be encoded using vector representation - word embedding. Known representatives of model types that produce these embeddings are Word2Vec, FastText and Glove. In this thesis, a newer type of model named Dict2Vec is introduced. It is a Word2Vec extension that leverages lexical dictionaries. The thesis describes the preparation of data from various corpus and dictionary sources and compares accuracy of each model type. It also introduces a web application that uses word embedding.

Keywords

corpus, dictionary, definitions, lemmatization, Dict2Vec, Word2Vec, FastText, Glove, natural language processing

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade E

Date

10 June 2019

Reviewer

Fajčík Martin, Ing.

Committee

Smrž Pavel, doc. RNDr., Ph.D. (DCGM FIT BUT), předseda
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Holík Lukáš, doc. Mgr., Ph.D. (DITS FIT BUT), člen
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen

Citation

JANKECH, Marek. Shlukování slov podle významu. Brno, 2019. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-10. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/21483/

BibTeX

@bachelorsthesis{FITBT21483,
    author = "Marek Jankech",
    type = "Bachelor's thesis",
    title = "Shlukov\'{a}n\'{i} slov podle v\'{y}znamu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/21483/"
}

Theses