Result Details

Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge

KARAFIÁT, M.; GRÉZL, F.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J. Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge. In Proceedings of Interspeech 2015. Proceedings of Interspeech. Dresden: International Speech Communication Association, 2015. no. 09, p. 2454-2458. ISBN: 978-1-5108-1790-6. ISSN: 1990-9772.
Type
conference paper
Language
English
Authors
Abstract

We have presented our work towards the ASR of wide-bandnoisy reverberant speech in ASpIRE challenge. To solve thistask, we have started with augmenting Fisher data with artificiallynoised and reverberated versions.

Keywords

speech recognition, reverberation, dereverberation,neural networks, DNN

URL
Annotation

This paper describes several strategies tested in BUT’s submission to the IARPA ASpIRE challenge. The ASpIRE task was to develop an automatic speech recognition (ASR) system for wide-band noisy reverberant speech, while only clean CTS (Fisher) data was allowed for ASR training. To solve this task, we have started with augmenting Fisher data with artificially noised and reverberated versions. The most obvious adaptation was (1) to re-train the whole GMM/HMM-based ASR system. Then, two techniques were designed and tested to make the adaptation easier and overcome retraining the whole ASR on large amount of speech: (2) we trained a speech enhancement DNN (also called de-noising auto-encoder), and (3) we adapted the feature extraction based on stacked bottle-neck networks (SBN). While re-training the whole system works the best, only slightly inferior results were obtained with the autoencoder denoising followed by retraining of the first layers of the SBN hierarchy, letting most of the ASR system trained on clean Fisher unchanged. This shows a promising, efficient and fast way to port ASR systems to new conditions.

Published
2015
Pages
2454–2458
Journal
Proceedings of Interspeech, vol. 2015, no. 09, ISSN 1990-9772
Proceedings
Proceedings of Interspeech 2015
Conference
Interspeech Conference
ISBN
978-1-5108-1790-6
Publisher
International Speech Communication Association
Place
Dresden
UT WoS
000380581601048
EID Scopus
BibTeX
@inproceedings{BUT119908,
  author="Martin {Karafiát} and František {Grézl} and Lukáš {Burget} and Igor {Szőke} and Jan {Černocký}",
  title="Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge",
  booktitle="Proceedings of Interspeech 2015",
  year="2015",
  journal="Proceedings of Interspeech",
  volume="2015",
  number="09",
  pages="2454--2458",
  publisher="International Speech Communication Association",
  address="Dresden",
  isbn="978-1-5108-1790-6",
  issn="1990-9772",
  url="https://www.fit.vut.cz/research/publication/10972/"
}
Files
Projects
Big speech data analytics for contact centers, EU, Horizon 2020, start: 2015-01-01, end: 2017-12-31, completed
Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
Meeting Assistant (MINT), TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA04011311, start: 2014-10-01, end: 2017-12-31, completed
Research groups
Departments
Back to top