Publication Details
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors
Diez Sánchez Mireia, M.Sc., Ph.D. (DCGM)
Stafylakis Themos
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.
Until recently, the field of speaker diarization was
dominated by cascaded systems. Due to their limitations, mainly re-
garding overlapped speech and cumbersome pipelines, end-to-end
models have gained great popularity lately. One of the most success-
ful models is end-to-end neural diarization with encoder-decoder
based attractors (EEND-EDA). In this work, we replace the EDA
module with a Perceiver-based one and show its advantages over
EEND-EDA; namely obtaining better performance on the largely
studied Callhome dataset, finding the quantity of speakers in a
conversation more accurately, and faster inference time. Further-
more, when exhaustively compared with other methods, our model,
DiaPer, reaches remarkable performance with a very lightweight
design. Besides, we perform comparisons with other works and a
cascaded baseline across more than ten public wide-band datasets.
Together with this publication, we release the code of DiaPer as
well as models trained on public and free data.
@article{BUT189802,
author="Federico Nicolás {Landini} and Mireia {Diez Sánchez} and Themos {Stafylakis} and Lukáš {Burget}",
title="DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors",
journal="IEEE Transactions on Audio, Speech, and Language Processing",
year="2024",
volume="32",
number="7",
pages="3450--3465",
doi="10.1109/TASLP.2024.3422818",
issn="1558-7916",
url="https://ieeexplore.ieee.org/document/10584294"
}