Faculty of Information Technology, BUT

Publication Details

Training Data Augmentation and Data Selection

KARAFIÁT Martin, VESELÝ Karel, ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, WATANABE Shinji, BURGET Lukáš, ČERNOCKÝ Jan and SZŐKE Igor. Training Data Augmentation and Data Selection. New Era for Robust Speech Recognition: Exploiting Deep Learning. Computer Science, Artificial Intelligence. Heidelberg: Springer International Publishing, 2017, pp. 245-260. ISBN 978-3-319-64679-4. Available from: http://www.springer.com/gp/book/9783319646794#aboutBook
Czech title
Množení a selekce trénovacích dat
Type
book chapter
Language
english
Authors
Karafiát Martin, Ing., Ph.D. (DCGM FIT BUT)
Veselý Karel, Ing., Ph.D. (DCGM FIT BUT)
Žmolíková Kateřina, Ing. (FIT BUT)
Delcroix Marc (NTT)
Watanabe Shinji, Dr. (MERL)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, doc. Dr. Ing. (DCGM FIT BUT)
Szőke Igor, Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords
training data, augmentation, data selection
Abstract
This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. Chapter 10 is about the Training Data Augmentation and Data Selection.
Annotation
Data augmentation is a simple and efficient technique to improve the robustness of a speech recognizer when deployed in mismatched training-test conditions. Our work, conducted during the JSALT 2015 workshop, aimed at the development of: (1) Data augmentation strategies including noising and reverberation. They were tested in combination with two approaches to signal enhancement: a carefully engineered WPE dereverberation and a learned DNN-based denoising autoencoder. (2) Proposing a novel technique for extracting an informative vector from a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a "summary vector", representing an acoustic summary of an utterance. Such vector can be used directly for adaptation, but the main usage matching the aim of this chapter is for selection of augmented training data. All techniques were tested on the AMI training set and CHiME3 test set.
Published
2017
Pages
245-260
Book
New Era for Robust Speech Recognition: Exploiting Deep Learning
Series
Computer Science, Artificial Intelligence
ISBN
978-3-319-64679-4
Publisher
Springer International Publishing
Place
Heidelberg, DE
DOI
BibTeX
@INBOOK{FITPUB11588,
   author = "Martin Karafi\'{a}t and Karel Vesel\'{y} and Kate\v{r}ina \v{Z}mol\'{i}kov\'{a} and Marc Delcroix and Shinji Watanabe and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y} and Igor Sz\H{o}ke",
   title = "Training Data Augmentation and Data Selection",
   pages = "245--260",
   booktitle = "New Era for Robust Speech Recognition: Exploiting Deep Learning",
   series = "Computer Science, Artificial Intelligence",
   year = 2017,
   location = "Heidelberg, DE",
   publisher = "Springer International Publishing",
   ISBN = "978-3-319-64679-4",
   doi = "10.1007/978-3-319-64680-0\_10",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/11588"
}
Back to top