Result Details

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L. Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?. Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Québec City: International Speech Communication Association, 2024. p. 123.
Type
conference paper
Language
English
Authors
Abstract

In this paper, we apply the variational information bottleneck
approach to end-to-end neural diarization with encoder-decoder
attractors (EEND-EDA). This allows us to investigate what in-
formation is essential for the model. EEND-EDA utilizes attrac-
tors, vector representations of speakers in a conversation. Our
analysis shows that, attractors do not necessarily have to con-
tain speaker characteristic information. On the other hand, giv-
ing the attractors more freedom to allow them to encode some
extra (possibly speaker-specific) information leads to small but
consistent diarization performance improvements. Despite ar-
chitectural differences in EEND systems, the notion of attrac-
tors and frame embeddings is common to most of them and
not specific to EEND-EDA. We believe that the main conclu-
sions of this work can apply to other variants of EEND. Thus,
we hope this paper will be a valuable contribution to guide the
community to make more informed decisions when designing new systems.

Keywords

End-to-End Neural Diarization, Speaker Characteristic Information

URL
Published
2024
Pages
123–130
Proceedings
Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop
Conference
Odyssey 2024: The Speaker and Language Recognition Workshop
Publisher
International Speech Communication Association
Place
Québec City
DOI
BibTeX
@inproceedings{BUT193432,
  author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
  title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
  booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
  year="2024",
  pages="123--130",
  publisher="International Speech Communication Association",
  address="Québec City",
  doi="10.21437/odyssey.2024-18",
  url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}
Files
Projects
Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Research groups
Departments
Back to top