Thesis Details

Vyhledávání duplicitních textů

Bachelor's Thesis Student: Pekař Tomáš Academic Year: 2014/2015 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.
English title
Duplicate Text Identification
Language
Czech
Abstract

The aim of this work is to design and implement a system for duplicate text identification. The application should be able to index documents and also searching documents at index. In our work we deal with preprocessing documents, their fragmentation and indexing. Furthermore we analyze methods for duplicate text identification, that are also linked with strategies for selecting substrings. The thesis includes a description of the basic data structures that can be used to index n-grams.

Keywords

searching, hash, ducplicates, indexing, n-gram, inverted index, data structures

Department
Degree Programme
Information Technology
Files
Status
defended, grade D
Date
16 June 2015
Reviewer
Committee
Meduna Alexander, prof. RNDr., CSc. (DIFS FIT BUT), předseda
Beran Vítězslav, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Drábek Vladimír, doc. Ing., CSc. (DCSY FIT BUT), člen
Křena Bohuslav, Ing., Ph.D. (DITS FIT BUT), člen
Očenášek Pavel, Mgr. Ing., Ph.D. (DIFS FIT BUT), člen
Citation
PEKAŘ, Tomáš. Vyhledávání duplicitních textů. Brno, 2015. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2015-06-16. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/9668/
BibTeX
@bachelorsthesis{FITBT9668,
    author = "Tom\'{a}\v{s} Peka\v{r}",
    type = "Bachelor's thesis",
    title = "Vyhled\'{a}v\'{a}n\'{i} duplicitn\'{i}ch text\r{u}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2015,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/9668/"
}
Back to top