Detail výsledku

Compact Network for Speakerbeam Target Speaker Extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; ARAKI, S.; NAKATANI, T. Compact Network for Speakerbeam Target Speaker Extraction. In Proceedings of ICASSP. Brighton: IEEE Signal Processing Society, 2019. p. 6965-6969. ISBN: 978-1-5386-4658-8.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Delcroix Marc, FIT (FIT)
Žmolíková Kateřina, Ing., Ph.D., UPGM (FIT)
OCHIAI, T.
Kinoshita Keisuke, FIT (FIT)
ARAKI, S.
Nakatani Tomohiro, FIT (FIT)

Abstrakt

Speech separation that separates a mixture of speech signals intoeach of its sources has been an active research topic for a long timeand has seen recent progress with the advent of deep learning. Arelated problem is target speaker extraction, i.e. extraction of onlyspeech of a target speaker out of a mixture, given characteristics ofhis/her voice. We have recently proposed SpeakerBeam, which isa neural network-based target speaker extraction method. Speaker-Beam uses a speech extraction network that is adapted to the targetspeaker using auxiliary features derived from an adaptation utteranceof that speaker. Initially, we implemented SpeakerBeam with afactorized adaptation layer, which consists of several parallel lineartransformations weighted by weights derived from the auxiliary features.The factorized layer is effective for target speech extraction,but it requires a large number of parameters. In this paper, we proposeto simply scale the activations of a hidden layer of the speechextraction network with weights derived from the auxiliary features.This simpler approach greatly reduces the number of model parametersby up to 60%, making it much more practical, while maintaininga similar level of performance. We tested our approach on simulatedand real noisy and reverberant mixtures, showing the potential ofSpeakerBeam for real-life applications. Moreover, we showed thatspeech extraction performance of SpeakerBeam compares favorablywith that of a state-of-the-art speech separation method with a similarnetwork configuration.

Klíčová slova

Target speech extraction, Neural network,Adaptation, Auxiliary feature, Speech enhancement

URL

Rok

2019

Strany

6965–6969

Sborník

Proceedings of ICASSP

Konference

2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ISBN

978-1-5386-4658-8

Vydavatel

IEEE Signal Processing Society

Místo

Brighton

DOI

10.1109/ICASSP.2019.8683087

UT WoS

000482554007040

EID Scopus

2-s2.0-85069006044

BibTeX

@inproceedings{BUT160003,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and ARAKI, S. and NAKATANI, T.",
  title="Compact Network for Speakerbeam Target Speaker Extraction",
  booktitle="Proceedings of ICASSP",
  year="2019",
  pages="6965--6969",
  publisher="IEEE Signal Processing Society",
  address="Brighton",
  doi="10.1109/ICASSP.2019.8683087",
  isbn="978-1-5386-4658-8",
  url="https://ieeexplore.ieee.org/document/8683087"
}

Soubory

pdf delcroix_icassp2019_0006965.pdf 944 kB

Projekty

Neuronové sítě pro zpracování signálu a dolování informací v řeči - NOSIČI, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, zahájení: 2018-01-01, ukončení: 2019-12-31, ukončen
NTT - Parametrizace s obohacováním řeči pro robustní automatické rozpoznávání řeči s velkým objemem trénovacích dat, NTT, zahájení: 2019-01-01, ukončení: 2019-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)