Thesis Details

Extrakce informací z biomedicínských textů

Master's Thesis Student: Knoth Petr Academic Year: 2007/2008 Supervisor: Smrž Pavel, doc. RNDr., Ph.D.
English title
Information Extraction from Biomedical Texts
Language
Czech
Abstract

Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply  knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable.  In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.

Keywords

information extraction, machine learning, natural language processing

Department
Degree Programme
Information Technology, Field of Study Intelligent Systems
Files
Status
defended, grade A
Date
16 June 2008
Reviewer
Committee
Češka Milan, prof. RNDr., CSc. (DITS FIT BUT), předseda
Hanáček Petr, doc. Dr. Ing. (DITS FIT BUT), člen
Herout Adam, prof. Ing., Ph.D. (DCGM FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Peringer Petr, Dr. Ing. (DITS FIT BUT), člen
Racek Stanislav, doc. Ing., CSc. (WBU in Pilsen), člen
Citation
KNOTH, Petr. Extrakce informací z biomedicínských textů. Brno, 2008. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2008-06-16. Supervised by Smrž Pavel. Available from: https://www.fit.vut.cz/study/thesis/6981/
BibTeX
@mastersthesis{FITMT6981,
    author = "Petr Knoth",
    type = "Master's thesis",
    title = "Extrakce informac\'{i} z biomedic\'{i}nsk\'{y}ch text\r{u}",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2008,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/6981/"
}
Back to top