Result Details
BUT System for the MLC-SLM Challenge
Han Jiangyu, DCGM (FIT)
Klement Dominik, Ing., DCGM (FIT)
Cornell Samuele
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
We present a two-speaker automatic speech recognition (ASR) system that combines DiCoW—a diarization-conditioned variant of Whisper—with DiariZen, a diarization pipeline built on top of Pyannote. We first evaluate both systems in out-of-domain (OOD) multilingual scenarios without any fine-tuning.
In this scenario, DiariZen consistently outperforms the baseline Pyannote diarization model, demonstrating strong generalization. Despite being fine-tuned on English-only data for target-speaker ASR, DiCoW retains solid multilingual performance,indicating that encoder modifications preserve Whisper’s multilingual capabilities. We then fine-tune both DiCoW and DiariZen on the MLC-SLM challenge data. The fine-tuned DiariZen continues to outperform the fine-tuned Pyannote baseline, while DiCoW sees further gains from domain adaptation. Our final system achieves a micro-average tcpWER/CER of 16.75 % and ranks second in Task 2 of the MLC-SLM challenge. Lastly, we identify several labeling inconsistencies in the training data—such as missing speech segments and incorrect silence annotations—which can hinder diarization fine-tuning. We propose simple mitigation strategies to address these issues and improve system robustness.
DiCoW, Multilingual Multi-Talker ASR, DiariZen, Whisper
@inproceedings{BUT199410,
author="Alexander {Polok} and Jiangyu {Han} and Dominik {Klement} and {} and Jan {Černocký} and Lukáš {Burget}",
title="BUT System for the MLC-SLM Challenge",
year="2025",
pages="23--27",
publisher="ISCA",
address="ISCA",
doi="10.21437/mlcslm.2025-6",
url="https://www.isca-archive.org/mlcslm_2025/polok25_mlcslm.pdf"
}
Multilingual and Cross-cultural interactions for context-aware, and bias-controlled dialogue systems for safety-critical applications, EU, HORIZON EUROPE, start: 2024-01-01, end: 2026-12-31, running
Practical verification of the possibility of integrating artificial intelligence for receiving emergency calls using a voice chatbot, developed within the research project BV No. VI20192022169, with technology for receiving emergency communications, MV, 1 VS OPSEC, VK01020132, start: 2023-01-06, end: 2025-10-31, completed