Publication Details

Semi-supervised DNN training with word selection for ASR

VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. Semi-supervised DNN training with word selection for ASR. In: Proceedings of Interspeech 2017. Stockholm: International Speech Communication Association, 2017, pp. 3687-3691. ISSN 1990-9772. Available from: http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1385.PDF
Czech title
Částečně kontrolované trénování DNN s výběrem slov pro ASR
Type
conference paper
Language
english
Authors
URL
Keywords

semi-supervised training, DNN, word selection, granularity of confidences

Abstract

The article is about semi-supervised DNN training with word selection for Automatic Speaker Recognition (ASR).

Annotation

Not all the questions related to the semi-supervised training of hybrid ASR system with DNN acoustic model were already deeply investigated. In this paper, we focus on the question of the granularity of confidences (per-sentence, per-word, perframe), the question of how the data should be used (dataselection by masks, or in mini-batch SGD with confidences as weights). Then, we propose to re-tune the system with the manually transcribed data, both with the frame CE training and sMBR training. Our preferred semi-supervised recipe which is both simple and efficient is following: we select words according to the word accuracy we obtain on the development set. Such recipe, which does not rely on a grid-search of the training hyperparameter, generalized well for: Babel Vietnamese (transcribed 11h, untranscribed 74h), Babel Bengali (transcribed 11h, untranscribed 58h) and our custom Switchboard setup (transcribed 14h, untranscribed 95h). We obtained the absolute WER improvements 2.5% for Vietnamese, 2.3% for Bengali and 3.2% for Switchboard.

Published
2017
Pages
3687-3691
Journal
Proceedings of Interspeech, vol. 2017, no. 8, ISSN 1990-9772
Proceedings
Proceedings of Interspeech 2017
Conference
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), Stockholm, SE
Publisher
International Speech Communication Association
Place
Stockholm, SE
DOI
UT WoS
000457505000766
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB11584,
   author = "Karel Vesel\'{y} and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Semi-supervised DNN training with word selection for ASR",
   pages = "3687--3691",
   booktitle = "Proceedings of Interspeech 2017",
   journal = "Proceedings of Interspeech",
   volume = 2017,
   number = 08,
   year = 2017,
   location = "Stockholm, SE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2017-1385",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11584"
}
Back to top