Faculty of Information Technology, BUT

Publication Details

Improved Feature Processing for Deep Neural Networks

RATH Shakti P., POVEY Daniel, VESELÝ Karel and ČERNOCKÝ Jan. Improved Feature Processing for Deep Neural Networks. In: Proceedings of Interspeech 2013. Lyon: International Speech Communication Association, 2013, pp. 109-113. ISBN 978-1-62993-443-3. ISSN 2308-457X.
Czech title
Zlepšené zpracování příznaků pro hluboké neuronové sítě
Type
conference paper
Language
english
Authors
Rath Shakti P. (DCGM FIT BUT)
Povey Daniel (JHU)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, doc. Dr. Ing. (DCGM FIT BUT)
URL
Keywords
speech recognition, speaker recognition, neural networks, speaker adaptation
Abstract
In this paper, we explore various methods of providing higherdimensional features to DNNs, while still applying speaker adaptation with fMLLR of low dimensionality.
Annotation
In this paper, we investigate alternative ways of processing MFCC-based features to use as the input to Deep Neural Networks (DNNs). Our baseline is a conventional feature pipeline that involves splicing the 13-dimensional front-end MFCCs across 9 frames, followed by applying LDA to reduce the dimension to 40 and then further decorrelation using MLLT. Confirming the results of other groups, we show that speaker adaptation applied on the top of these features using feature-space MLLR is helpful. The fact that the number of parameters of a DNN is not strongly sensitive to the input feature dimension (unlike GMM-based systems) motivated us to investigate ways to increase the dimension of the features. In this paper, we investigate several approaches to derive higher-dimensional features and verify their performance with DNN. Our best result is obtained from splicing our baseline 40-dimensional speaker adapted features again across 9 frames, followed by reducing the dimension to 200 or 300 using another LDA. Our final result is about 3% absolute better than our best GMM system, which is a discriminatively trained model.
Published
2013
Pages
109-113
Journal
Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013)., no. 8, ISSN 2308-457X
Proceedings
Proceedings of Interspeech 2013
Conference
Interspeech 2013, Lyon, Francie, FR
ISBN
978-1-62993-443-3
Publisher
International Speech Communication Association
Place
Lyon, FR
BibTeX
@INPROCEEDINGS{FITPUB10432,
   author = "P. Shakti Rath and Daniel Povey and Karel Vesel\'{y} and Jan \v{C}ernock\'{y}",
   title = "Improved Feature Processing for Deep Neural Networks",
   pages = "109--113",
   booktitle = "Proceedings of Interspeech 2013",
   journal = "Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013).",
   number = 8,
   year = 2013,
   location = "Lyon, FR",
   publisher = "International Speech Communication Association",
   ISBN = "978-1-62993-443-3",
   ISSN = "2308-457X",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/10432"
}
Back to top