Faculty of Information Technology, BUT

Publication Details

Boosting Performance on Low-resource Languages by Standard Corpora: AN ANALYSIS

GRÉZL František and KARAFIÁT Martin. Boosting Performance on Low-resource Languages by Standard Corpora: AN ANALYSIS. In: Proceeding of SLT 2016. San Diego: IEEE Signal Processing Society, 2016, pp. 629-636. ISBN 978-1-5090-4903-5.
Czech title
Zlepšení úspěšnosti na jazycích s omezenými zdroji pomocí standardních řečových databází: analýza
Type
conference paper
Language
english
Authors
URL
Keywords
DNN topology, Stacked Bottle-neck, feature extraction, multilingual training, system porting, low resource
Abstract
In this paper, we have evaluated the multilingual techniques for single source-language scenario. Since it is hard to obtain coherent multilingual corpora usable for multilingual training, using single, well resourced, language instead is quite attractive.
Annotation
In this paper, we analyze the feasibility of using single wellresourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail both is positive. A greater effect is observed for increasing number of tied states. The modified neural network structure, shown useful for multilingual porting, was also evaluated with its specific porting procedure. Using original NN structure in combination with modified porting adapt-adapt strategy was fount as best. It achieves relative improvement 3.5-8.8% on variety of target languages. These results are comparable with using multilingual NNs pretrained on 7 languages.
Published
2016
Pages
629-636
Proceedings
Proceeding of SLT 2016
Conference
2016 IEEE Workshop on Spoken Language Technology, San Diego, California, US
ISBN
978-1-5090-4903-5
Publisher
IEEE Signal Processing Society
Place
San Diego, US
DOI
BibTeX
@INPROCEEDINGS{FITPUB11311,
   author = "Franti\v{s}ek Gr\'{e}zl and Martin Karafi\'{a}t",
   title = "Boosting Performance  on Low-resource Languages by Standard Corpora: AN ANALYSIS",
   pages = "629--636",
   booktitle = "Proceeding of SLT 2016",
   year = 2016,
   location = "San Diego, US",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-5090-4903-5",
   doi = "10.1109/SLT.2016.7846329",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11311"
}
Back to top