Dissertation Topic

End-to-end systems for speaker diarization

Academic Year: 2024/2025

Supervisor: Burget Lukáš, doc. Ing., Ph.D.

Co-supervisor: Diez Sánchez Mireia, M.Sc., Ph.D.

Department: Department of Computer Graphics and Multimedia

Programs:
Information Technology (DIT) - full-time study
Information Technology (DIT-EN) - full-time study

Speaker diarization (SD, determining who spoke when) is a crucial part of speech data mining and artificial intelligence (AI). It is crucial for down-stream algorithms, e. g. automatic speech recognition (ASR). Current SD performs well on many conditions but fails to handle overlapped speech. more than two speakers and realistic recordings (diverse acoustic conditions and speaking styles). Moreover, most current SD characterize speakers only using the acoustic information. Future SD will use an amalgam of inputs to enhance performance using all possible information resources, and this PhD topic proposes significant advances towards this goal. We will develop new architectures that extend the end-to-end SD paradigm to different multi-task scenarios. We also propose to integrate the processing of multi-stream inputs exploiting complementary information. The ultimate goal of the project is to combine all such systems into a unified framework that will substantially improve the performance of SD.