Detail výsledku

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L. Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?. Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Québec City: International Speech Communication Association, 2024. p. 123-130.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

ZHANG, L.
Stafylakis Themos
Landini Federico Nicolás, Ph.D., UPGM (FIT)
DIEZ SÁNCHEZ, M.
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)

Abstrakt

In this paper, we apply the variational information bottleneck
approach to end-to-end neural diarization with encoder-decoder
attractors (EEND-EDA). This allows us to investigate what in-
formation is essential for the model. EEND-EDA utilizes attrac-
tors, vector representations of speakers in a conversation. Our
analysis shows that, attractors do not necessarily have to con-
tain speaker characteristic information. On the other hand, giv-
ing the attractors more freedom to allow them to encode some
extra (possibly speaker-specific) information leads to small but
consistent diarization performance improvements. Despite ar-
chitectural differences in EEND systems, the notion of attrac-
tors and frame embeddings is common to most of them and
not specific to EEND-EDA. We believe that the main conclu-
sions of this work can apply to other variants of EEND. Thus,
we hope this paper will be a valuable contribution to guide the
community to make more informed decisions when designing new systems.

Klíčová slova

End-to-End Neural Diarization, Speaker Characteristic Information

URL

Rok

2024

Strany

123–130

Sborník

Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

Konference

Odyssey 2024: The Speaker and Language Recognition Workshop

Vydavatel

International Speech Communication Association

Místo

Québec City

DOI

10.21437/odyssey.2024-18

BibTeX

@inproceedings{BUT193432,
  author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
  title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
  booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
  year="2024",
  pages="123--130",
  publisher="International Speech Communication Association",
  address="Québec City",
  doi="10.21437/odyssey.2024-18",
  url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}

Soubory

pdf zhang_2024_odyssey.pdf 5 MB

Projekty

Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)