Result Details

Compact Network for Speakerbeam Target Speaker Extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; ARAKI, S.; NAKATANI, T. Compact Network for Speakerbeam Target Speaker Extraction. In Proceedings of ICASSP. Brighton: IEEE Signal Processing Society, 2019. p. 6965-6969. ISBN: 978-1-5386-4658-8.

Type

conference paper

Language

English

Authors

Delcroix Marc, FIT (FIT)
Žmolíková Kateřina, Ing., Ph.D., DCGM (FIT)
OCHIAI, T.
Kinoshita Keisuke, FIT (FIT)
ARAKI, S.
Nakatani Tomohiro, FIT (FIT)

Abstract

Speech separation that separates a mixture of speech signals intoeach of its sources has been an active research topic for a long timeand has seen recent progress with the advent of deep learning. Arelated problem is target speaker extraction, i.e. extraction of onlyspeech of a target speaker out of a mixture, given characteristics ofhis/her voice. We have recently proposed SpeakerBeam, which isa neural network-based target speaker extraction method. Speaker-Beam uses a speech extraction network that is adapted to the targetspeaker using auxiliary features derived from an adaptation utteranceof that speaker. Initially, we implemented SpeakerBeam with afactorized adaptation layer, which consists of several parallel lineartransformations weighted by weights derived from the auxiliary features.The factorized layer is effective for target speech extraction,but it requires a large number of parameters. In this paper, we proposeto simply scale the activations of a hidden layer of the speechextraction network with weights derived from the auxiliary features.This simpler approach greatly reduces the number of model parametersby up to 60%, making it much more practical, while maintaininga similar level of performance. We tested our approach on simulatedand real noisy and reverberant mixtures, showing the potential ofSpeakerBeam for real-life applications. Moreover, we showed thatspeech extraction performance of SpeakerBeam compares favorablywith that of a state-of-the-art speech separation method with a similarnetwork configuration.

Keywords

Target speech extraction, Neural network,Adaptation, Auxiliary feature, Speech enhancement

URL

Published

2019

Pages

6965–6969

Proceedings

Proceedings of ICASSP

Conference

2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ISBN

978-1-5386-4658-8

Publisher

IEEE Signal Processing Society

Place

Brighton

DOI

10.1109/ICASSP.2019.8683087

UT WoS

000482554007040

EID Scopus

2-s2.0-85069006044

BibTeX

@inproceedings{BUT160003,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and ARAKI, S. and NAKATANI, T.",
  title="Compact Network for Speakerbeam Target Speaker Extraction",
  booktitle="Proceedings of ICASSP",
  year="2019",
  pages="6965--6969",
  publisher="IEEE Signal Processing Society",
  address="Brighton",
  doi="10.1109/ICASSP.2019.8683087",
  isbn="978-1-5386-4658-8",
  url="https://ieeexplore.ieee.org/document/8683087"
}

Files

pdf delcroix_icassp2019_0006965.pdf 944 kB

Projects

Neural networks for signal processing and speech data mining, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, start: 2018-01-01, end: 2019-12-31, completed
NTT - Speech enhancement front-end for robust automatic speech recognition with large amount of training data, NTT, start: 2019-01-01, end: 2019-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)