Publication Details

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

LANDINI Federico Nicolás, DIEZ Sánchez Mireia, STAFYLAKIS Themos and BURGET Lukáš. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, vol. 32, no. 7, 2024, pp. 3450-3465. ISSN 1558-7916. Available from: https://ieeexplore.ieee.org/document/10584294
Czech title
DiaPer: End-to-End neurální diarizace mluvčích s atraktory založenými na modelech typu perceiver
Type
journal article
Language
english
Authors
Landini Federico Nicolás (DCGM FIT BUT)
Diez Sánchez Mireia (UPV)
Stafylakis Themos (OMILIA)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.

Abstract

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.

Published
2024
Pages
3450-3465
Journal
IEEE Transactions on Audio, Speech, and Language Processing, vol. 32, no. 7, ISSN 1558-7916
Publisher
IEEE Signal Processing Society
DOI
UT WoS
001283673700005
EID Scopus
BibTeX
@ARTICLE{FITPUB13279,
   author = "Nicol\'{a}s Federico Landini and Mireia S\'{a}nchez Diez and Themos Stafylakis and Luk\'{a}\v{s} Burget",
   title = "DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors",
   pages = "3450--3465",
   journal = "IEEE Transactions on Audio, Speech, and Language Processing",
   volume = 32,
   number = 7,
   year = 2024,
   ISSN = "1558-7916",
   doi = "10.1109/TASLP.2024.3422818",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13279"
}
Back to top