Faculty of Information Technology, BUT

Thesis Details

"Semi-supervised" trénování hlubokých neuronových sítí pro rozpoznávání řeči

Ph.D. Thesis Student: Veselý Karel Academic Year: 2017/2018 Supervisor: Burget Lukáš, doc. Ing., Ph.D.
English title
Semi-Supervised Training of Deep Neural Networks for Speech Recognition
Language
Czech
Status
defended
Date
3 April 2018
Department
Degree Programme
Field of Study
Files
Keywords
Deep neural networks, speech recognition, semi-supervised training, Kaldi, nnet1
Abstract

In this thesis, we first present the theory of neural network training for the speech recognition, along with our implementation, that is available as the 'nnet1' training recipe in the Kaldi toolkit. The recipe contains RBM pre-training, mini-batch frame Cross-Entropy training and sequence-discriminative sMBR training. Then we continue with the main topic of this thesis: semi-supervised training of DNN-based ASR systems. Inspired by the literature survey and our initial experiments, we investigated several problems: First, whether the confidences are better to be calculated per-sentence, per-word or per-frame. Second, whether the confidences should be used for data-selection or data-weighting. Both approaches are compatible with the framework of weighted mini-batch SGD training. Then we tried to get better insight into confidence calibration, more precisely whether it can improve the efficiency of semi-supervised training. We also investigated how the model should be re-tuned with the correctly transcribed data. Finally, we proposed a simple recipe that avoids a grid search of hyper-parameters, and therefore is very practical for general use with any dataset.The experiments were conducted on several data-sets: for Babel Vietnamese with 10 hours of transcribed speech, the Word Error Rate (WER) was reduced by 2.5%. For SwitchboardEnglish with 14 hours of transcribed speech, the WER was reduced by 3.2%. Although we found it difficult to further improve the performance of semi-supervised training by meansof enhancing the confidences, we still believe that our findings are of significant practical value: the untranscribed data are abundant and easy to obtain, and our proposed solutionbrings solid WER improvements and it is not difficult to replicate.

Citation
VESELÝ, Karel. "Semi-supervised" trénování hlubokých neuronových sítí pro rozpoznávání řeči. Brno, 2017. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2018-04-03. Supervised by Burget Lukáš. Available from: https://www.fit.vut.cz/study/phd-thesis/568/
BibTeX
@PHDTHESIS{FITPT568,
    author = "Karel Vesel\'{y}",
    type = "Ph.D. thesis",
    title = "{"}Semi-supervised{"} tr\'{e}nov\'{a}n\'{i} hlubok\'{y}ch neuronov\'{y}ch s\'{i}t\'{i} pro  rozpozn\'{a}v\'{a}n\'{i} \v{r}e\v{c}i",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2018,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/phd-thesis/568/"
}
Back to top