Detail výsledku

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023. p. 1-5. ISBN: 978-1-7281-6327-7.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Landini Federico Nicolás, Ph.D., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Lozano Díez Alicia, Ph.D.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Abstrakt

End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once. Many flavors of end-to-end models have been proposed but all of them require (so far non-existing) large amounts of annotated data for training. The compromise solution consists in generating synthetic data and the recently proposed simulated conversations (SC) have shown remarkable improvements over the original simulated mixtures (SM). In this work, we create SC with multiple speakers per conversation and show that they allow for substantially better performance than SM, also reducing the dependence on a fine-tuning stage. We also create SC with wide-band public audio sources and present an analysis on several evaluation sets. Together with this publication, we release the recipes for generating such data and models trained on public sets as well as the implementation to efficiently handle multiple speakers per conversation and an auxiliary voice activity detection loss.

Klíčová slova

Speaker diarization, end-to-end neural diarization, simulated conversations

URL
Rok
2023
Strany
1–5
Sborník
Proceedings of ICASSP 2023
Konference
2023 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE
ISBN
978-1-7281-6327-7
Vydavatel
IEEE Signal Processing Society
Místo
Rhodes Island
DOI
EID Scopus
BibTeX
@inproceedings{BUT185197,
  author="Federico Nicolás {Landini} and Mireia {Diez Sánchez} and Alicia {Lozano Díez} and Lukáš {Burget}",
  title="Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization",
  booktitle="Proceedings of ICASSP 2023",
  year="2023",
  pages="1--5",
  publisher="IEEE Signal Processing Society",
  address="Rhodes Island",
  doi="10.1109/ICASSP49357.2023.10097049",
  isbn="978-1-7281-6327-7",
  url="https://ieeexplore.ieee.org/document/10097049"
}
Soubory
Projekty
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení
Výzkumné skupiny
Pracoviště
Nahoru