Detail výsledku

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; OCHIAI, T.; NAKATANI, T.; BURGET, L.; ČERNOCKÝ, J. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures. IEEE Journal of Selected Topics in Signal Processing, 2019, vol. 13, no. 4, p. 800-814. ISSN: 1932-4553.
Typ
článek v časopise
Jazyk
anglicky
Autoři
Žmolíková Kateřina, Ing., Ph.D., UPGM (FIT)
Delcroix Marc, FIT (FIT)
Kinoshita Keisuke, FIT (FIT)
OCHIAI, T.
Nakatani Tomohiro, FIT (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

The processing of speech corrupted by interferingoverlapping speakers is one of the challenging problems withregards to todays automatic speech recognition systems. Recently,approaches based on deep learning have made great progresstoward solving this problem. Most of these approaches tacklethe problem as speech separation, i.e., they blindly recover allthe speakers from the mixture. In some scenarios, such as smartpersonal devices, we may however be interested in recovering onetarget speaker froma mixture. In this paper, we introduce Speaker-Beam, a method for extracting a target speaker from the mixturebased on an adaptation utterance spoken by the target speaker.Formulating the problem as speaker extraction avoids certainissues such as label permutation and the need to determine thenumber of speakers in the mixture.With SpeakerBeam, we jointlylearn to extract a representation from the adaptation utterancecharacterizing the target speaker and to use this representationto extract the speaker. We explore several ways to do this, mostlyinspired by speaker adaptation in acoustic models for automaticspeech recognition. We evaluate the performance on the widelyused WSJ0-2mix andWSJ0-3mix datasets, and these datasets modifiedwith more noise or more realistic overlapping patterns. Wefurther analyze the learned behavior by exploring the speaker representationsand assessing the effect of the length of the adaptationdata. The results show the benefit of including speaker informationin the processing and the effectiveness of the proposed method.

Klíčová slova

Speaker extraction, speaker-aware neural network,multi-speaker speech recognition.

URL
Rok
2019
Strany
800–814
Časopis
IEEE Journal of Selected Topics in Signal Processing, roč. 13, č. 4, ISSN 1932-4553
DOI
UT WoS
000477715300003
EID Scopus
BibTeX
@article{BUT159990,
  author="ŽMOLÍKOVÁ, K. and DELCROIX, M. and KINOSHITA, K. and OCHIAI, T. and NAKATANI, T. and BURGET, L. and ČERNOCKÝ, J.",
  title="SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures",
  journal="IEEE Journal of Selected Topics in Signal Processing",
  year="2019",
  volume="13",
  number="4",
  pages="800--814",
  doi="10.1109/JSTSP.2019.2922820",
  issn="1932-4553",
  url="https://ieeexplore.ieee.org/document/8736286"
}
Projekty
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Neuronové sítě pro zpracování signálu a dolování informací v řeči - NOSIČI, TAČR, Program na podporu aplikovaného výzkumu ZÉTA, TJ01000208, zahájení: 2018-01-01, ukončení: 2019-12-31, ukončen
Zpracování, zobrazování a analýza multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-17-3984, zahájení: 2017-03-01, ukončení: 2020-02-29, ukončen
Výzkumné skupiny
Pracoviště
Nahoru