Detail výsledku

Speaker-aware neural network based beamformer for speaker extraction in speech mixtures

ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; OGAWA, A.; NAKATANI, T. Speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stocholm: International Speech Communication Association, 2017. no. 08, p. 2655-2659. ISSN: 1990-9772.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Žmolíková Kateřina, Ing., Ph.D., UPGM (FIT)
Delcroix Marc, FIT (FIT)
Kinoshita Keisuke
Higuchi Takuya
Ogawa Atsunori
Nakatani Tomohiro

Abstrakt

This article is about the speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In this work, we address the problem of extracting one target speaker from a multichannel mixture of speech. We use a neural network to estimate masks to extract the target speaker and derive beamformer filters using these masks, in a similar way as the recently proposed approach for extraction of speech in presence of noise. To overcome the permutation ambiguity of neural network mask estimation, which arises in presence of multiple speakers, we propose to inform the neural network about the target speaker so that it learns to follow the speaker characteristics through the utterance. We investigate and compare different methods of passing the speaker information to the network such as making one layer of the network dependent on speaker characteristics. Experiments on mixture of two speakers demonstrate that the proposed scheme can track and extract a target speaker for both closed and open speaker set cases.

Klíčová slova

speaker extraction, speaker-aware neural network,beamforming, mask estimation

URL

Rok

2017

Strany

2655–2659

Časopis

Proceedings of Interspeech, roč. 2017, č. 08, ISSN 1990-9772

Sborník

Proceedings of Interspeech 2017

Konference

Interspeech Conference

Vydavatel

International Speech Communication Association

Místo

Stocholm

DOI

10.21437/Interspeech.2017-667

UT WoS

000457505000551

EID Scopus

2-s2.0-85034117887

BibTeX

@inproceedings{BUT144496,
  author="Kateřina {Žmolíková} and Marc {Delcroix} and Keisuke {Kinoshita} and Takuya {Higuchi} and Atsunori {Ogawa} and Tomohiro {Nakatani}",
  title="Speaker-aware neural network based beamformer for speaker extraction in speech mixtures",
  booktitle="Proceedings of Interspeech 2017",
  year="2017",
  journal="Proceedings of Interspeech",
  volume="2017",
  number="08",
  pages="2655--2659",
  publisher="International Speech Communication Association",
  address="Stocholm",
  doi="10.21437/Interspeech.2017-667",
  issn="1990-9772",
  url="http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0667.PDF"
}

Soubory

pdf zmolikova_interspeech2017_IS170667.pdf 2 MB

Projekty

Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)