Detail výsledku

Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization

PÁLKA, P.; LANDINI, F.; KLEMENT, D.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; DELCROIX, M.; BURGET, L. Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization. Palermo: IEEE Signal Processing Society, 2025. p. 31-35. ISBN: 978-9-46-459362-4.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Pálka Petr, Ing., UPGM (FIT)
Landini Federico Nicolás, Ph.D.
Klement Dominik, Ing., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Delcroix Marc, FIT (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)

Abstrakt

In spite of the popularity of end-to-end diarization
systems nowadays, modular systems comprised of voice activity
detection (VAD), speaker embedding extraction plus clustering,
and overlapped speech detection (OSD) plus handling still attain
competitive performance in many conditions. However, one of
the main drawbacks of modular systems is the need to run
(and train) different modules independently. In this work, we
propose an approach to jointly train a model to produce speaker
embeddings, VAD and OSD simultaneously and reach competitive
performance at a fraction of the inference time of a modular
approach. Furthermore, the joint inference leads to a simplified
overall pipeline which brings us one step closer to a unified
clustering-based method that can be trained end-to-end towards
a diarization-specific objective.

Klíčová slova

speaker diarization, speaker embedding, voice activity detection, overlapped speech detection

Rok

2025

Strany

31–35

Konference

The 33rd European Signal Processing Conference (EUSIPCO 2025)

ISBN

978-9-46-459362-4

Vydavatel

IEEE Signal Processing Society

Místo

Palermo

BibTeX

@inproceedings{BUT198669,
  author="Petr {Pálka} and Federico Nicolás {Landini} and Dominik {Klement} and Mireia {Diez Sánchez} and Anna {Silnova} and Marc {Delcroix} and Lukáš {Burget}",
  title="Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization",
  year="2025",
  pages="31--35",
  publisher="IEEE Signal Processing Society",
  address="Palermo",
  isbn="978-9-46-459362-4",
  url="https://www.fit.vut.cz/research/publication/13567/"
}

Projekty

Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, zahájení: 2025-01-01, ukončení: 2028-12-31, řešení
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)