Publication Details

Speaker activity driven neural speech extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; NAKATANI, T. Speaker activity driven neural speech extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Toronto: IEEE Signal Processing Society, 2021. p. 6099-6103. ISBN: 978-1-7281-7605-5.

Czech title

Neurální extrakce řeči řízená aktivitou řečníka

Type

conference paper

Language

English

Authors

Delcroix Marc
Žmolíková Kateřina, Ing., Ph.D. (FIT)
OCHIAI, T.
Kinoshita Keisuke
Nakatani Tomohiro

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2021/delcroix_icassp2021_2101.05516.pdf PDF

Keywords

Speech extraction, Speaker activity, Speech enhancement,Meeting recognition, Neural network

Abstract

Target speech extraction, which extracts the speech of a targetspeaker in a mixture given auxiliary speaker clues, has recentlyreceived increased interest. Various clues have been investigatedsuch as pre-recorded enrollment utterances, direction information,or video of the target speaker. In this paper, we explore the use ofspeaker activity information as an auxiliary clue for single-channelneural network-based speech extraction. We propose a speaker activitydriven speech extraction neural network (ADEnet) and showthat it can achieve performance levels competitive with enrollmentbasedapproaches, without the need for pre-recordings. We furtherdemonstrate the potential of the proposed approach for processingmeeting-like recordings, where speaker activity obtained from a diarizationsystem is used as a speaker clue for ADEnet. We show thatthis simple yet practical approach can successfully extract speakersafter diarization, which leads to improved ASR performancewhen using a single microphone, especially in high overlappingconditions, with relative word error rate reduction of up to 25 %.

Published

2021

Pages

6099–6103

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, CA

ISBN

978-1-7281-7605-5

Publisher

IEEE Signal Processing Society

Place

Toronto

DOI

10.1109/ICASSP39728.2021.9414998

UT WoS

000704288406074

EID Scopus

2-s2.0-85109793342

BibTeX

@inproceedings{BUT171749,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and NAKATANI, T.",
  title="Speaker activity driven neural speech extraction",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2021",
  pages="6099--6103",
  publisher="IEEE Signal Processing Society",
  address="Toronto",
  doi="10.1109/ICASSP39728.2021.9414998",
  isbn="978-1-7281-7605-5",
  url="https://www.fit.vut.cz/research/publication/12479/"
}

Files

pdf delcroix_icassp2021_09414998.pdf 4 MB