Detail výsledku

Enhancement and Analysis of Conversational Speech: JSALT 2017

RYANT, N.; BERGELSON, E.; CHURCH, K.; CRISTIA, A.; DU, J.; GANAPATHY, S.; KHUDANPUR, S.; KOWALSKI, D.; KRISHNAMOORTHY, M.; KULSHRESHTA, R.; LIBERMAN, M.; LU, Y.; MACIEJEWSKI, M.; METZE, F.; PROFANT, J.; SUN, L.; TSAO, Y.; YU, Z. Enhancement and Analysis of Conversational Speech: JSALT 2017. In Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018. p. 5154-5158. ISBN: 978-1-5386-4658-8.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
RYANT, N.
BERGELSON, E.
Church Kenneth
CRISTIA, A.
DU, J.
GANAPATHY, S.
Khudanpur Sanjeev
KOWALSKI, D.
KRISHNAMOORTHY, M.
KULSHRESHTA, R.
LIBERMAN, M.
LU, Y.
Maciejewski Matthew
Metze Florian
Profant Ján, Ing.
SUN, L.
TSAO, Y.
YU, Z.
Abstrakt

Automatic speech recognition is more and more widely and effectivelyused. Nevertheless, in some automatic speech analysis tasksthe state of the art is surprisingly poor. One of these is "diarization",the task of determining who spoke when. Diarization is key toprocessing meeting audio and clinical interviews, extended recordingssuch as police body cam or child language acquisition data, andany other speech data involving multiple speakers whose voices arenot cleanly separated into individual channels. Overlapping speech,environmental noise and suboptimal recording techniques make theproblem harder. During the JSALT Summer Workshop at CMU in2017, an international team of researchers worked on several aspectsof this problem, including calibration of the state of the art, detectionof overlaps, enhancement of noisy recordings, and classification ofshorter speech segments. This paper sketches the workshops results,and announces plans for a "Diarization Challenge" to encourage furtherprogress.

Klíčová slova

diarization, overlap detection, speech enhancement,automatic speech recognition

URL
Rok
2018
Strany
5154–5158
Sborník
Proceedings of ICASSP 2018
Konference
IEEE International Conference on Acoustics, Speech and Signal Processing
ISBN
978-1-5386-4658-8
Vydavatel
IEEE Signal Processing Society
Místo
Calgary
DOI
UT WoS
000446384605065
EID Scopus
BibTeX
@inproceedings{BUT155050,
  author="RYANT, N. and BERGELSON, E. and CHURCH, K. and CRISTIA, A. and DU, J. and GANAPATHY, S. and KHUDANPUR, S. and KOWALSKI, D. and KRISHNAMOORTHY, M. and KULSHRESHTA, R. and LIBERMAN, M. and LU, Y. and MACIEJEWSKI, M. and METZE, F. and PROFANT, J. and SUN, L. and TSAO, Y. and YU, Z.",
  title="Enhancement and Analysis of Conversational Speech: JSALT 2017",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="5154--5158",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8462468",
  isbn="978-1-5386-4658-8",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2018/profant_icassp2018_0005154.pdf"
}
Projekty
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Zpracování, zobrazování a analýza multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-17-3984, zahájení: 2017-03-01, ukončení: 2020-02-29, ukončen
Výzkumné skupiny
Pracoviště
Nahoru