Detail výsledku

Speaker activity driven neural speech extraction

DELCROIX, M.; ŽMOLÍKOVÁ, K.; OCHIAI, T.; KINOSHITA, K.; NAKATANI, T. Speaker activity driven neural speech extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Toronto: IEEE Signal Processing Society, 2021. p. 6099-6103. ISBN: 978-1-7281-7605-5.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Delcroix Marc, FIT (FIT)
Žmolíková Kateřina, Ing., Ph.D., UPGM (FIT)
OCHIAI, T.
Kinoshita Keisuke, FIT (FIT)
Nakatani Tomohiro, FIT (FIT)

Abstrakt

Target speech extraction, which extracts the speech of a targetspeaker in a mixture given auxiliary speaker clues, has recentlyreceived increased interest. Various clues have been investigatedsuch as pre-recorded enrollment utterances, direction information,or video of the target speaker. In this paper, we explore the use ofspeaker activity information as an auxiliary clue for single-channelneural network-based speech extraction. We propose a speaker activitydriven speech extraction neural network (ADEnet) and showthat it can achieve performance levels competitive with enrollmentbasedapproaches, without the need for pre-recordings. We furtherdemonstrate the potential of the proposed approach for processingmeeting-like recordings, where speaker activity obtained from a diarizationsystem is used as a speaker clue for ADEnet. We show thatthis simple yet practical approach can successfully extract speakersafter diarization, which leads to improved ASR performancewhen using a single microphone, especially in high overlappingconditions, with relative word error rate reduction of up to 25 %.

Klíčová slova

Speech extraction, Speaker activity, Speech enhancement,Meeting recognition, Neural network

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2021/delcroix… PDF

Rok

2021

Strany

6099–6103

Sborník

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Konference

2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ISBN

978-1-7281-7605-5

Vydavatel

IEEE Signal Processing Society

Místo

Toronto

DOI

10.1109/ICASSP39728.2021.9414998

UT WoS

000704288406074

EID Scopus

2-s2.0-85109793342

BibTeX

@inproceedings{BUT171749,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and OCHIAI, T. and KINOSHITA, K. and NAKATANI, T.",
  title="Speaker activity driven neural speech extraction",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2021",
  pages="6099--6103",
  publisher="IEEE Signal Processing Society",
  address="Toronto",
  doi="10.1109/ICASSP39728.2021.9414998",
  isbn="978-1-7281-7605-5",
  url="https://www.fit.vut.cz/research/publication/12479/"
}

Soubory

pdf delcroix_icassp2021_09414998.pdf 4 MB

Projekty

IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Parametrizace s obohacováním řeči pro robustní automatické rozpoznávání řeči s velkým objemem trénovacích dat, NTT, zahájení: 2021-01-01, ukončení: 2021-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)