Thesis Details

Topic Identification from Spoken TED-Talks

Bachelor's Thesis Student: Vašš Adam Academic Year: 2018/2019 Supervisor: Kesiraju Santosh
Czech title
Topic Identification from Spoken TED-Talks
Language
English
Abstract

This thesis deals with the problems of language recognition and topic classification, using TED-LIUM corpus to train both the ASR and classification models. The ASR system is built using the Kaldi toolkit, achieving the WER of 16.6%. The classification problem is addressed using linear classification methods, specifically Multinomial Naive Bayes and Linear Support Vector Machines, the latter method achieving higher topic classification accuracy.

Keywords

TED, talks, topic identification, machine learning, classification, transcription, linear classification, Kaldi, support vector machines, acoustic modeling, language modeling, TED-LIUM, ASR

Department
Degree Programme
Information Technology
Files
Status
defended, grade C
Date
29 August 2019
Reviewer
Committee
Růžička Richard, doc. Ing., Ph.D., MBA (DCSY FIT BUT), předseda
Dytrych Jaroslav, Ing., Ph.D. (DCGM FIT BUT), člen
Křena Bohuslav, Ing., Ph.D. (DITS FIT BUT), člen
Ryšavý Ondřej, doc. Ing., Ph.D. (DIFS FIT BUT), člen
Španěl Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Citation
VAŠŠ, Adam. Topic Identification from Spoken TED-Talks. Brno, 2019. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-08-29. Supervised by Kesiraju Santosh. Available from: https://www.fit.vut.cz/study/thesis/22509/
BibTeX
@bachelorsthesis{FITBT22509,
    author = "Adam Va\v{s}\v{s}",
    type = "Bachelor's thesis",
    title = "Topic Identification from Spoken TED-Talks",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/22509/"
}
Back to top