Thesis Details

Topic Identification from Spoken TED-Talks

Bachelor's Thesis Student: Vašš Adam Academic Year: 2018/2019 Supervisor: Kesiraju Santosh
Czech title
Topic Identification from Spoken TED-Talks
Language
English
Abstract

This thesis deals with the problems of language recognition and topic classification, using TED-LIUM corpus to train both the ASR and classification models. The ASR system is built using the Kaldi toolkit, achieving the WER of 16.6\%. The classification problem is addressed using linear classification methods, specifically Multinomial Naive Bayes and Linear Support Vector Machines, the latter method achieving higher topic classification accuracy.

Keywords

TED, talks, topic identification, machine learning, classification, transcription, linear classification, Kaldi, support vector machines, acoustic modeling, language modeling, TED-LIUM, ASR

Department
Degree Programme
Information Technology
Files
Status
not defended
Date
14 June 2019
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Hliněná Dana, doc. RNDr., Ph.D. (DMAT FEEC BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Citation
VAŠŠ, Adam. Topic Identification from Spoken TED-Talks. Brno, 2019. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2019-06-14. Supervised by Kesiraju Santosh. Available from: https://www.fit.vut.cz/study/thesis/21519/
BibTeX
@bachelorsthesis{FITBT21519,
    author = "Adam Va\v{s}\v{s}",
    type = "Bachelor's thesis",
    title = "Topic Identification from Spoken TED-Talks",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2019,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/21519/"
}
Back to top