Result Details

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio Speech and Language Processing, 2024, vol. 32, no. 7, p. 3450-3465. ISSN: 1558-7916.
Type
journal article
Language
English
Authors
Abstract

Until recently, the field of speaker diarization was
dominated by cascaded systems. Due to their limitations, mainly re-
garding overlapped speech and cumbersome pipelines, end-to-end
models have gained great popularity lately. One of the most success-
ful models is end-to-end neural diarization with encoder-decoder
based attractors (EEND-EDA). In this work, we replace the EDA
module with a Perceiver-based one and show its advantages over
EEND-EDA; namely obtaining better performance on the largely
studied Callhome dataset, finding the quantity of speakers in a
conversation more accurately, and faster inference time. Further-
more, when exhaustively compared with other methods, our model,
DiaPer, reaches remarkable performance with a very lightweight
design. Besides, we perform comparisons with other works and a
cascaded baseline across more than ten public wide-band datasets.
Together with this publication, we release the code of DiaPer as
well as models trained on public and free data.

Keywords

Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.

URL
Published
2024
Pages
3450–3465
Journal
IEEE Transactions on Audio Speech and Language Processing, vol. 32, no. 7, ISSN 1558-7916
DOI
UT WoS
001283673700005
EID Scopus
BibTeX
@article{BUT189802,
  author="Federico Nicolás {Landini} and Mireia {Diez Sánchez} and Themos {Stafylakis} and Lukáš {Burget}",
  title="DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors",
  journal="IEEE Transactions on Audio Speech and Language Processing",
  year="2024",
  volume="32",
  number="7",
  pages="3450--3465",
  doi="10.1109/TASLP.2024.3422818",
  issn="1558-7916",
  url="https://ieeexplore.ieee.org/document/10584294"
}
Files
Projects
Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, running
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Research groups
Departments
Back to top