Detail výsledku

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics

DELCROIX, M.; ŽMOLÍKOVÁ, K.; KINOSHITA, K.; ARAKI, S.; OGAWA, A.; NAKATANI, T. SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics. NTT Technical Review, 2018, vol. 16, no. 11, p. 19-24. ISSN: 1348-3447.
Typ
článek v časopise
Jazyk
anglicky
Autoři
Delcroix Marc, FIT (FIT)
Žmolíková Kateřina, Ing., Ph.D., UPGM (FIT)
Kinoshita Keisuke, FIT (FIT)
ARAKI, S.
Ogawa Atsunori, FIT (FIT)
Nakatani Tomohiro, FIT (FIT)
Abstrakt

In a noisy environment such as a cocktail party, humans can focus on listening to a desired speaker, anability known as selective hearing. Current approaches developed to realize computational selectivehearing require knowing the position of the target speaker, which limits their practical usage. This articleintroduces SpeakerBeam, a deep learning based approach for computational selective hearing based onthe characteristics of the target speakers voice. SpeakerBeam requires only a small amount of speechdata from the target speaker to compute his/her voice characteristics. It can then extract the speech ofthat speaker regardless of his/her position or the number of speakers talking in the background.

Klíčová slova

deep learning, target speaker extraction, SpeakerBeam

URL
Rok
2018
Strany
19–24
Časopis
NTT Technical Review, roč. 16, č. 11, ISSN 1348-3447
EID Scopus
BibTeX
@article{BUT185149,
  author="DELCROIX, M. and ŽMOLÍKOVÁ, K. and KINOSHITA, K. and ARAKI, S. and OGAWA, A. and NAKATANI, T.",
  title="SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics",
  journal="NTT Technical Review",
  year="2018",
  volume="16",
  number="11",
  pages="19--24",
  issn="1348-3447",
  url="https://www.ntt-review.jp/archive/ntttechnical.php?contents=ntr201811all.pdf&mode=show_pdf"
}
Soubory
Projekty
NTT - Parametrizace s obohacováním řeči pro robustní automatické rozpoznávání řeči s velkým objemem trénovacích dat, NTT, zahájení: 2017-10-01, ukončení: 2018-09-30, ukončen
Zpracování, zobrazování a analýza multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-17-3984, zahájení: 2017-03-01, ukončení: 2020-02-29, ukončen
Výzkumné skupiny
Pracoviště
Nahoru