Thesis Details

Využití získávání znalostí pro data z PDF souborů

Bachelor's Thesis Student: Dvořáček Libor Academic Year: 2020/2021 Supervisor: Bartík Vladimír, Ing., Ph.D.

English title

Use of Knowledge Discovery for Data from PDF Files

Language

Czech

Abstract

This bachelor thesis deals with the extraction of tables from digitally created pdfs and the subsequent use of the obtained data for data analysis. Methods of dimension reduction and cluster analysis are used. The main content is an analysis of available tools for data extraction in the python language, a description and comparison of the used machine learning methods and implementation of an application that combines all these topics into one functional unit at: http://extraktor.herokuapp.com

Keywords

data mining, knowledge discovery, Python, PDF, PCA, Dendrogram, T-SNE, K-MEANS, UMAP, dimensionality reduction, visualization of high-dimensional datasets, cluster analysis, Dash, Plotly, Heroku

Department

Department of Information Systems FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade B

Date

18 June 2021

Reviewer

Burgetová Ivana, Ing., Ph.D.

Committee

Kolář Dušan, doc. Dr. Ing. (DIFS FIT BUT), předseda
Burgetová Ivana, Ing., Ph.D. (DIFS FIT BUT), člen
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Hrubý Martin, Ing., Ph.D. (DITS FIT BUT), člen
Španěl Michal, Ing., Ph.D. (DCGM FIT BUT), člen

Citation

DVOŘÁČEK, Libor. Využití získávání znalostí pro data z PDF souborů. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-18. Supervised by Bartík Vladimír. Available from: https://www.fit.vut.cz/study/thesis/23895/

BibTeX

@bachelorsthesis{FITBT23895,
    author = "Libor Dvo\v{r}\'{a}\v{c}ek",
    type = "Bachelor's thesis",
    title = "Vyu\v{z}it\'{i} z\'{i}sk\'{a}v\'{a}n\'{i} znalost\'{i} pro data z PDF soubor\r{u}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/23895/"
}

Theses