Thesis Details

Interpretability of Neural Networks in Speech Processing

Bachelor's Thesis Student: Sarvaš Marek Academic Year: 2020/2021 Supervisor: Žmolíková Kateřina, Ing., Ph.D.

Czech title

Interpretace neuronových sítí ve zpracování řeči

Language

English

Abstract

With the growing popularity of deep neural networks, the lack of transparency caused by their black box representation is raising demand for their interpretability. The goal of this thesis is to gain new insights into deep neural networks in speech processing tasks. Specifically, gender classification task on AudioMNIST dataset and speaker classification task on filterbanks from VoxCeleb dataset using convolutional and residual neural network. Layer-wise relevance propagation was used for the interpretation of these neural networks. This method produced heatmaps highlighting features that contributed positively and negatively to the correct classification. As results of interpretation show, classifications were mainly based on lower frequencies in time. In the case of gender classification, I managed to find the model's high dependency on a small number of features. Using obtained information, I created an augmented training set that increased the model's robustness.

Keywords

deep neural networks, convolutional neural networks, speech processing, interpretation of neural networks, Layer-Wise Relevance Propagation

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade A

Date

16 June 2021

Reviewer

Mošner Ladislav, Ing.

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, prof. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen

Citation

SARVAŠ, Marek. Interpretability of Neural Networks in Speech Processing. Brno, 2021. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2021-06-16. Supervised by Žmolíková Kateřina. Available from: https://www.fit.vut.cz/study/thesis/24073/

BibTeX

@bachelorsthesis{FITBT24073,
    author = "Marek Sarva\v{s}",
    type = "Bachelor's thesis",
    title = "Interpretability of Neural Networks in Speech Processing",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2021,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/24073/"
}

Theses