Result Details

Optimizing bottle-neck features for LVCSR

GRÉZL, F.; FOUSEK, P. Optimizing bottle-neck features for LVCSR. 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, Nevada: IEEE Signal Processing Society, 2008. p. 4729-4732. ISBN: 1-4244-1484-9.
Type
conference paper
Language
English
Authors
Grézl František, Ing., Ph.D., DCGM (FIT)
Fousek Petr
Abstract

This publication deals with optimising various processing steps in Bottle-Neck feature extraction for lower word error rate on large vocabulary continuous speech recognition tasks.

Keywords

Bottle-neck, MLP structure, features, LVCSR

URL
Annotation

This work continues in development of the recently proposed. Bottle-Neck features for ASR. A five-layers MLP used in bottle-neck  feature extraction allows to obtain arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology -- number and sizes of layers, suitable training targets, the impact of output feature transforms, the need of delta features, and the dimensionality of the final feature vector are studied with respect to the best ASR result. Optimized features are employed in three LVCSR tasks: Arabic broadcast news, English conversational telephone speech and English meetings. Improvements over standard cepstral features and probabilistic MLP features are shown for different tasks and different neural net input representations. A significant improvement is observed when phoneme MLP training targets are replaced by phoneme states and when delta features are added.

Published
2008
Pages
4729–4732
Proceedings
2008 IEEE International Conference on Acoustics, Speech, and Signal Processing
Conference
33rd International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
ISBN
1-4244-1484-9
Publisher
IEEE Signal Processing Society
Place
Las Vegas, Nevada
BibTeX
@inproceedings{BUT27765,
  author="František {Grézl} and Petr {Fousek}",
  title="Optimizing bottle-neck features for LVCSR",
  booktitle="2008 IEEE International Conference on Acoustics, Speech, and Signal Processing",
  year="2008",
  pages="4729--4732",
  publisher="IEEE Signal Processing Society",
  address="Las Vegas, Nevada",
  isbn="1-4244-1484-9",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2008/grezl_BN_optim_icassp_2008.pdf"
}
Projects
Augmented Multi-party Interaction, EU, Sixth Framework programme, 506811-AMI, start: 2004-01-01, end: 2006-12-31, completed
CARETAKER - Content Analysis and REtrieval Technologies to Apply Knowledge Extraction to massive Recording, EU, Sixth Framework programme, 027231, start: 2006-03-01, end: 2008-09-30, completed
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Research groups
Departments
Back to top