Thesis Details

Recognition of Multi-Talker Overlapping Speech Using Neural Networks

Bachelor's Thesis Student: Hradil Jaromír Academic Year: 2019/2020 Supervisor: Žmolíková Kateřina, Ing., Ph.D.
Czech title
Rozpoznávání řeči překrývajících se řečníků pomocí neuronových sítí
Language
English
Abstract

This work deals with the speech recognition of overlapping speakers using a neural network. It examines the problem of speech recognition from multiple speakers and the ways in which this problem is solved. Specifically, in addition to traditional components such as convolutional neural networks, LSTM, etc., it is also an application of special components: attention mechanism and gated convolution. And also the application of a technique called permutation invariant training. Part of this work is to apply these approaches to assigned training data, which consists of artificially created mixtures of two speakers reading articles from the Wall Street Journal. The next step was to train the respective architectures using the combinations of the elements mentioned above. The models in this work replace the acoustic model. There were two architectures using different types of attention mechanism and one without it. Experiments have shown that architectures using the attention mechanism in this type of task have not surpassed more traditional architecture by suffering from gated convolution. Nevertheless, they showed potential.

Keywords

speech recognition,neural networks,attention mechanism,overlapping speech

Department
Degree Programme
Information Technology
Files
Status
defended, grade B
Date
10 July 2020
Reviewer
Committee
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
Citation
HRADIL, Jaromír. Recognition of Multi-Talker Overlapping Speech Using Neural Networks. Brno, 2020. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2020-07-10. Supervised by Žmolíková Kateřina. Available from: https://www.fit.vut.cz/study/thesis/23005/
BibTeX
@bachelorsthesis{FITBT23005,
    author = "Jarom\'{i}r Hradil",
    type = "Bachelor's thesis",
    title = "Recognition of Multi-Talker Overlapping Speech Using Neural Networks",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2020,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/23005/"
}
Back to top