Thesis Details
Odezírání ze rtů pomocí hlubokých neuronových sítí
This thesis deals with current methods for automatic speech recognition and lip reading via neural networks. Furthermore it deals with similarities in the architectures of neural networks for audio and visual data and available datasets in the field of audiovisual automatic speech recognition. The main contribution of this thesis is set of experiments comparing different changes in neural network architecture and its impact on results. The thesis includes an implementation of a system for automatic speech recognition from audio (CER: 12.6 %) and visual (CER: 57,7 %) data. The architectures of both systems are based on features extraction via convolutional networks followed by recurrent layers LSTM, another layer of convolutions and loss function CTC.
Lip reading, speech recognition, neural networks, recurrent neural network, convolution, computer vision, sequence to sequence, Encoder-Decoder, CTC, PyTorch, Python.
Bidlo Michal, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Čadík Martin, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Křivka Zbyněk, Ing., Ph.D. (DIFS FIT BUT), člen
Rogalewicz Adam, doc. Mgr., Ph.D. (DITS FIT BUT), člen
@bachelorsthesis{FITBT21772, author = "Josef Kadle\v{c}ek", type = "Bachelor's thesis", title = "Odez\'{i}r\'{a}n\'{i} ze rt\r{u} pomoc\'{i} hlubok\'{y}ch neuronov\'{y}ch s\'{i}t\'{i}", school = "Brno University of Technology, Faculty of Information Technology", year = 2019, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/21772/" }