Result Details

The Role of Neural Network Size in TRAP/HATS Feature Extraction

GRÉZL, F. The Role of Neural Network Size in TRAP/HATS Feature Extraction. Proceedings Text, Speech and Dialogue 2011. Lecture Notes in Computer Science. LNAI 6836. Plzeň: Springer Verlag, 2011. no. 9, p. 315-322. ISBN: 978-3-642-23537-5. ISSN: 0302-9743.
Type
conference paper
Language
English
Authors
Abstract

This article examines the performance of TRAP/HATS based probabilistic features in ASR. The sizes of neural networks in both stages of processing are changed and the influence is evaluated.

Keywords

Neural networks, feature extraction, probabilistic features

URL
Annotation

We study the role of sizes of neural networks (NNs) in TRAP (Tempo- RAl Patterns) and HATS (Hidden Activation TRAPS architecture) probabilistic features extraction. The question of sufficient size of band NNs is linked with the question whether the Merger is able to compensate for lower accuracy of band NNs. For both architectures, the performance increases with increasing size of Merger NN. For TRAP architecture, it was observed, that increasing band NN size over some value has not further positive effect on final performance. The situation is different when HATS architecture is employed - increasing size of band NNs has mostly negative effect on final performance. This is caused by merger not being able to efficiently exploit the information hidden in its input with increased size. The solution is proposed in form of bottle-neck NN which allows for arbitrary size output.

Published
2011
Pages
315–322
Journal
Lecture Notes in Computer Science, vol. 2011, no. 9, ISSN 0302-9743
Proceedings
Proceedings Text, Speech and Dialogue 2011
Series
LNAI 6836
Conference
14th International Conference on Text, Speech and Dialogue
ISBN
978-3-642-23537-5
Publisher
Springer Verlag
Place
Plzeň
BibTeX
@inproceedings{BUT76446,
  author="František {Grézl}",
  title="The Role of Neural Network Size in TRAP/HATS Feature Extraction",
  booktitle="Proceedings Text, Speech and Dialogue 2011",
  year="2011",
  series="LNAI 6836",
  journal="Lecture Notes in Computer Science",
  volume="2011",
  number="9",
  pages="315--322",
  publisher="Springer Verlag",
  address="Plzeň",
  isbn="978-3-642-23537-5",
  issn="0302-9743",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/grezl_tsd2011.pdf"
}
Projects
Advanced recognition and presentation of multimedia data, BUT, Vnitřní projekty VUT, FIT-S-11-2, start: 2011-01-01, end: 2013-12-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Technologies of speech processing for efficient human-machine communication, TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA01011328, start: 2011-01-01, end: 2014-12-31, completed
Theory and applications of phoneme posterior estimation in speech processing, GACR, Doktorské granty, GP102/09/P635, start: 2009-01-01, end: 2011-12-31, completed
Research groups
Departments
Back to top