Result Details

Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization

PÁLKA, P.; LANDINI, F.; KLEMENT, D.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L.; DELCROIX, M. Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization. In Proceedings of 33rd European Signal Processing Conference (EUSIPCO 2025). Palermo: IEEE Signal Processing Society, 2025. p. 31-35. ISBN: 978-9-46-459362-4.
Type
conference paper
Language
English
Authors
Abstract

In spite of the popularity of end-to-end diarization systems nowadays, modular systems comprised of voice activity detection (VAD), speaker embedding extraction plus clustering, and overlapped speech detection (OSD) plus handling still attain competitive performance in many conditions. However, one of the main drawbacks of modular systems is the need to run (and train) different modules independently. In this work, we propose an approach to jointly train a model to produce speaker embeddings, VAD and OSD simultaneously and reach competitive performance at a fraction of the inference time of a modular approach. Furthermore, the joint inference leads to a simplified overall pipeline which brings us one step closer to a unified
clustering-based method that can be trained end-to-end towards a diarization-specific objective.

Keywords

speaker diarization, speaker embedding, voice activity detection, overlapped speech detection

URL
Published
2025
Pages
31–35
Proceedings
Proceedings of 33rd European Signal Processing Conference (EUSIPCO 2025)
Conference
The 33rd European Signal Processing Conference (EUSIPCO 2025)
ISBN
978-9-46-459362-4
Publisher
IEEE Signal Processing Society
Place
Palermo
DOI
EID Scopus
BibTeX
@inproceedings{BUT198669,
  author="Petr {Pálka} and Federico Nicolás {Landini} and Dominik {Klement} and Mireia {Diez Sánchez} and Anna {Silnova} and  {} and  {} and Lukáš {Burget} and Marc {Delcroix}",
  title="Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization",
  booktitle="Proceedings of 33rd European Signal Processing Conference (EUSIPCO 2025)",
  year="2025",
  pages="31--35",
  publisher="IEEE Signal Processing Society",
  address="Palermo",
  doi="10.23919/EUSIPCO63237.2025.11226253",
  isbn="978-9-46-459362-4",
  url="https://eusipco2025.org/wp-content/uploads/pdfs/0000031.pdf"
}
Files
Projects
Linguistics, Artificial Intelligence and Language and Speech Technologies: from Research to Applications, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, start: 2025-01-01, end: 2028-12-31, running
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Research groups
Departments
Back to top