Result Details

Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion

ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; NAKATANI, T.; ČERNOCKÝ, J. Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion. In Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018. p. 6702-6706. ISBN: 978-1-5386-4658-8.

Type

conference paper

Language

English

Authors

Žmolíková Kateřina, Ing., Ph.D., DCGM (FIT)
Delcroix Marc, FIT (FIT)
Kinoshita Keisuke, FIT (FIT)
Higuchi Takuya, FIT (FIT)
Nakatani Tomohiro, FIT (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

This paper addresses the problem of recognizing speech corruptedby overlapping speakers in a multichannel setting. Toextract a target speaker from the mixture, we use a neural networkbased beamformer which uses masks estimated by a neuralnetwork to compute statistically optimal spatial filters. Followingour previous work, we inform the neural network about thetarget speaker using information extracted from an adaptation utterance,enabling the network to track the target speaker. Whilein the previous work, this method was used to separately extractthe speaker and then pass such preprocessed speech to a speechrecognition system, here we explore training both systems jointlywith a common speech recognition criterion. We show that integratingthe two systems and training for the final objective improvesthe performance. In addition, the integration enables furthersharing of information between the acoustic model and thespeaker extraction system, by making use of the predicted HMMstateposteriors to refine the masks used for beamforming.

Keywords

Speaker extraction, joint training, speakeradaptive neural network, beamforming, speech recognition

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2018/zmolikova… PDF

Published

2018

Pages

6702–6706

Proceedings

Proceedings of ICASSP 2018

Conference

IEEE International Conference on Acoustics, Speech and Signal Processing

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Calgary

DOI

10.1109/ICASSP.2018.8461533

UT WoS

000446384606172

EID Scopus

2-s2.0-85054269733

BibTeX

@inproceedings{BUT155044,
  author="Kateřina {Žmolíková} and Marc {Delcroix} and Keisuke {Kinoshita} and Takuya {Higuchi} and Tomohiro {Nakatani} and Jan {Černocký}",
  title="Optimization of Speaker-aware Multichannel Speech Extraction with ASR Criterion",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="6702--6706",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8461533",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11722/"
}

Files

pdf zmolikova_icassp2018_0006702.pdf 250 kB

Projects

IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Neural networks for signal processing and speech data mining, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, start: 2018-01-01, end: 2019-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)