Result Details
SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics
Žmolíková Kateřina, Ing., Ph.D., DCGM (FIT)
Kinoshita Keisuke, FIT (FIT)
ARAKI, S.
Ogawa Atsunori, FIT (FIT)
Nakatani Tomohiro, FIT (FIT)
In a noisy environment such as a cocktail party, humans can focus on listening to a desired speaker, anability known as selective hearing. Current approaches developed to realize computational selectivehearing require knowing the position of the target speaker, which limits their practical usage. This articleintroduces SpeakerBeam, a deep learning based approach for computational selective hearing based onthe characteristics of the target speakers voice. SpeakerBeam requires only a small amount of speechdata from the target speaker to compute his/her voice characteristics. It can then extract the speech ofthat speaker regardless of his/her position or the number of speakers talking in the background.
deep learning, target speaker extraction, SpeakerBeam
@article{BUT185149,
author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and KINOSHITA, K. and ARAKI, S. and OGAWA, A. and NAKATANI, T.",
title="SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics",
journal="NTT Technical Review",
year="2018",
volume="16",
number="11",
pages="19--24",
issn="1348-3447",
url="https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811all.pdf&mode=show_pdf"
}
Zpracování, zobrazování a analýza multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-17-3984, start: 2017-03-01, end: 2020-02-29, completed