Thesis Details
Extrakce informací z Wikipedie
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
article classification, entity type detection, natural text, natural language processing, partof-speech tagging, SpaCy, Stanford CoreNLP, Wikipedia, SVM, Support Vector Machine,machine learning, artificial intelligence, attribute extraction
Fučík Otto, doc. Dr. Ing. (DCSY FIT BUT), člen
Holík Lukáš, doc. Mgr., Ph.D. (DITS FIT BUT), člen
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT), člen
Veselý Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
@bachelorsthesis{FITBT18942, author = "Ond\v{r}ej Valu\v{s}ek", type = "Bachelor's thesis", title = "Extrakce informac\'{i} z Wikipedie", school = "Brno University of Technology, Faculty of Information Technology", year = 2019, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/18942/" }