Thesis Details

Visual Question Answering

Bachelor's Thesis Student: Kocurek Pavel Academic Year: 2020/2021 Supervisor: Fajčík Martin, Ing.
Czech title
Systém pro odpovídaní na otázky s využitím obrazu
Language
English
Abstract

Visual Question Answering (VQA) is a system where an image and a question are used as input and the output is an answer. Despite many research advances, unlike image captioning, VQA is rarely used in practice. This work aims to narrow the gap between research and practice. To examine the possibility of using VQA by blind and visually impaired people, this thesis proposes a demonstrative VQA application and then, a smartphone application. The study with 20 participants from the community was conducted. Firstly, the participants received an application for two weeks. Then, each of them was asked to fill out the questionnaire. 80 % of respondents rated the accuracy of VQA application as sufficient or better and most of them would appreciate it if their image captioning application also supported VQA. Following this discovery, this work tries to establish the link between image captioning and VQA. In particular, the work studies the informativeness provided by both systems in different scenarios. It collects a novel dataset of 111 images with manually annotated captions and diverse scenes. An experiment comparing obtained knowledge showed a success rate of 69.9 % and 46.2 % for VQA and image captioning, respectively. In another experiment 70.9 % of the time, participants were able to select the correct caption based on VQA. The results suggest that VQA outperforms image captioning regarding image details, therefore should be used in practice more often.

Keywords

visual question answering, computer vision, natural language processing, question answering, image captioning, deep learning, questionnaire, rnn, lstm, bert, object detection

Department
Degree Programme
Information Technology
Files
Status
defended, grade B
Date
16 June 2021
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Citation
KOCUREK, Pavel. Visual Question Answering. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-16. Supervised by Fajčík Martin. Available from: https://www.fit.vut.cz/study/thesis/22598/
BibTeX
@bachelorsthesis{FITBT22598,
    author = "Pavel Kocurek",
    type = "Bachelor's thesis",
    title = "Visual Question Answering",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/22598/"
}
Back to top