Result Details
Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering
Rangappa Pradeep
Madikeri Srikanth
Kumar Shashi
Burdisso Sergio
Prakash Jeena
Villatoro-Tello Esau
Motlíček Petr, doc. Ing., Ph.D., DCGM (FIT)
Sharma Bidisha
Hacioglu Kadri
Venkatesan Shankar
Vyas Saurabh
Stolcke Andreas
Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxiliary data. Filtering based on multi-model consensus or named entity recognition (NER) is then applied to select and iteratively refine pseudo-labels, showing slower performance saturation compared to random selection. Evaluated on the multi-domain Wow call center and Fisher English corpora, it outperforms single-step fine-tuning. Consensus-based filtering outperforms other methods, providing up to 22.3% relative improvement on Wow and 24.8% on Fisher over single-step fine-tuning with random selection. NER is the second-best filter, providing competitive performance at a lower computational cost.
ASR, incremental semi-supervised learning, pseudo-labels filtering
@inproceedings{BUT201424,
author="{} and {} and {} and {} and {} and {} and {} and Petr {Motlíček} and {} and {} and {} and {} and {}",
title="Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering",
booktitle="Interspeech",
year="2025",
journal="Interspeech",
pages="3618--3622",
publisher="Isca-Int Speech Communication Assoc",
address="Rotterdam, The Netherlands",
doi="10.21437/Interspeech.2025-2601",
url="https://www.fit.vut.cz/research/group/speech/public/publi/2025/carofilis_interspeech2025_motlicek_co-author.pdf"
}