Publication Details
DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors
Diez Sánchez Mireia (UPV)
Stafylakis Themos (OMILIA)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.
Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.
@ARTICLE{FITPUB13279, author = "Nicol\'{a}s Federico Landini and Mireia S\'{a}nchez Diez and Themos Stafylakis and Luk\'{a}\v{s} Burget", title = "DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors", pages = "3450--3465", journal = "IEEE Transactions on Audio, Speech, and Language Processing", volume = 32, number = 7, year = 2024, ISSN = "1558-7916", doi = "10.1109/TASLP.2024.3422818", language = "english", url = "https://www.fit.vut.cz/research/publication/13279" }