Faculty of Information Technology, BUT

Publication Details

Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages

GRÉZL František and KARAFIÁT Martin. Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages. In: Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014, pp. 820-824. ISBN 978-1-63439-435-2. Available from: http://www.isca-speech.org/archive/interspeech_2014/i14_0820.html
Czech title
Kombinace multilinguálního trénování a trénování s automatickým získáním referencí pro málo rozšířené jazyky
Type
conference paper
Language
english
Authors
URL
Keywords
feature extraction, neural networks, stacked bottle-neck, multilingual training, semi-supervised training
Abstract
This article is about a combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages.
Annotation
Multilingual training of neural networks for ASR is widely studied these days. It has been shown that languages with little training data can benefit largely from the multilingual resources for training. The use of unlabeled data for the neural network training in semi-supervised manner has also improved the ASR system performance. Here, the combination of both methods is presented. First, multilingual training is performed to obtain an ASR system to automatically transcribe the unlabeled data. Then, the automatically transcribed data are added. Two neural networks are trained - one from random initialization and one adapted from multilingual network - to evaluate the effect of multilingual training under presence of larger amount of training data. Further, the CMLLR transform is applied in the middle of the stacked Bottle-Neck neural network structure. As the CMLLR rotates the features to better fit given model, we evaluated whether it is better to adapt the existing NN on the CMLLR features or if it is better to train it from random initialization. The last step in our training procedure is the fine-tuning on the original data. [Search]
Published
2014
Pages
820-824
Proceedings
Proceedings of Interspeech 2014
Conference
Interspeech 2014, Singapur, SG
ISBN
978-1-63439-435-2
Publisher
International Speech Communication Association
Place
Singapore, SG
BibTeX
@INPROCEEDINGS{FITPUB10716,
   author = "Franti\v{s}ek Gr\'{e}zl and Martin Karafi\'{a}t",
   title = "Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages",
   pages = "820--824",
   booktitle = "Proceedings of Interspeech 2014",
   year = 2014,
   location = "Singapore, SG",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-63439-435-2",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10716"
}
Back to top