Thesis Details

Využití získávání znalostí pro data z PDF souborů

Bachelor's Thesis Student: Dvořáček Libor Academic Year: 2020/2021 Supervisor: Bartík Vladimír, Ing., Ph.D.
English title
Use of Knowledge Discovery for Data from PDF Files
Language
Czech
Abstract

This bachelor thesis deals with the extraction of tables from digitally created pdfs and the subsequent use of the obtained data for data analysis. Methods of dimension reduction and cluster analysis are used. The main content is an analysis of available tools for data extraction in the python language, a description and comparison of the used machine learning methods and implementation of an application that combines all these topics into one functional unit at: http://extraktor.herokuapp.com

Keywords

data mining, knowledge discovery, Python, PDF, PCA, Dendrogram, T-SNE, K-MEANS, UMAP, dimensionality reduction, visualization of high-dimensional datasets, cluster analysis, Dash, Plotly, Heroku

Department
Degree Programme
Information Technology
Files
Status
defended, grade B
Date
18 June 2021
Reviewer
Committee
Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), předseda
Burgetová Ivana, Ing., Ph.D. (DIFS FIT BUT), člen
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Hrubý Martin, Ing., Ph.D. (DITS FIT BUT), člen
Španěl Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Citation
DVOŘÁČEK, Libor. Využití získávání znalostí pro data z PDF souborů. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-18. Supervised by Bartík Vladimír. Available from: https://www.fit.vut.cz/study/thesis/23895/
BibTeX
@bachelorsthesis{FITBT23895,
    author = "Libor Dvo\v{r}\'{a}\v{c}ek",
    type = "Bachelor's thesis",
    title = "Vyu\v{z}it\'{i} z\'{i}sk\'{a}v\'{a}n\'{i} znalost\'{i} pro data z PDF soubor\r{u}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/23895/"
}
Back to top