Result Details

Robust Speech Recognition in Unknown Reverberant and Noisy Conditions

HSIAO, R.; MA, J.; HARTMANN, W.; KARAFIÁT, M.; GRÉZL, F.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J.; WATANABE, S.; CHEN, Z.; MALLIDI, S.; HEŘMANSKÝ, H.; TSAKALIDIS, S.; SCHWARTZ, R. Robust Speech Recognition in Unknown Reverberant and Noisy Conditions. In Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop. Scottsdale, Arizona: IEEE Signal Processing Society, 2015. p. 533-538. ISBN: 978-1-4799-7291-3.

Type

conference paper

Language

English

Authors

Hsiao Roger, FIT (FIT)
Ma Jeff
Hartmann William
Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Grézl František, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Szőke Igor, Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Watanabe Shinji
Chen Zhuo
Mallidi Sri Harish, FIT (FIT)
Heřmanský Hynek, prof. Ing., Dr. Eng., DCGM (FIT)
Tsakalidis Stavros, FIT (FIT)
Schwartz Richard, FIT (FIT)

Abstract

In this paper, we describe our work on the ASpIRE (AutomaticSpeech recognition In Reverberant Environments)challenge, which aims to assess the robustness of automaticspeech recognition (ASR) systems. The main characteristic ofthe challenge is developing a high-performance system withoutaccess to matched training and development data. Whilethe evaluation data are recorded with far-field microphones innoisy and reverberant rooms, the training data are telephonespeech and close talking. Our approach to this challengeincludes speech enhancement, neural network methods andacoustic model adaptation, We show that these techniquescan successfully alleviate the performance degradation due tonoisy audio and data mismatch.

Keywords

ASpIRE challenge, robust speech recognition

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2015/hsiao_asru2015… PDF

Annotation

In this paper, we describe our work in the ASpIRE challenge. We experiment and evaluate different approaches to tackling the performance degradation due to noise and data mismatch. Our approaches include audio enhancement, data augmentation, unsupervised DNN adaptation, and system combination.

Published

2015

Pages

533–538

Proceedings

Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop

Conference

The 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015)

ISBN

978-1-4799-7291-3

Publisher

IEEE Signal Processing Society

Place

Scottsdale, Arizona

DOI

10.1109/ASRU.2015.7404841

UT WoS

000380604800076

EID Scopus

2-s2.0-84964470918

BibTeX

@inproceedings{BUT120392,
  author="Roger {Hsiao} and Jeff {Ma} and William {Hartmann} and Martin {Karafiát} and František {Grézl} and Lukáš {Burget} and Igor {Szőke} and Jan {Černocký} and Shinji {Watanabe} and Zhuo {Chen} and Sri Harish {Mallidi} and Hynek {Heřmanský} and Stavros {Tsakalidis} and Richard {Schwartz}",
  title="Robust Speech Recognition in Unknown Reverberant and Noisy Conditions",
  booktitle="Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop",
  year="2015",
  pages="533--538",
  publisher="IEEE Signal Processing Society",
  address="Scottsdale, Arizona",
  doi="10.1109/ASRU.2015.7404841",
  isbn="978-1-4799-7291-3",
  url="https://www.fit.vut.cz/research/publication/11067/"
}

Files

pdf hsiao_asru2015_0000533.pdf 139 kB

Projects

Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
IARPA Building Speech Recognition for Keyword Search in a New Language in a Week with Limited Training Data (BABEL) - Babelon, BBN, start: 2012-03-05, end: 2016-11-04, completed
Information mining in speech acquired by distant microphones, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, start: 2015-10-01, end: 2020-09-30, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)