Result Details

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

CAROFILIS, A.; RANGAPPA, P.; MADIKERI, S.; KUMAR, S.; BURDISSO, S.; PRAKASH, J.; VILLATORO-TELLO, E.; MOTLÍČEK, P.; SHARMA, B.; HACIOGLU, K.; VENKATESAN, S.; VYAS, S.; STOLCKE, A. Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering. In Interspeech. Interspeech. Rotterdam, The Netherlands: Isca-Int Speech Communication Assoc, 2025. p. 3618-3622.
Type
conference paper
Language
English
Authors
Carofilis Andres
Rangappa Pradeep
Madikeri Srikanth
Kumar Shashi
Burdisso Sergio
Prakash Jeena
Villatoro-Tello Esau
Motlíček Petr, doc. Ing., Ph.D., DCGM (FIT)
Sharma Bidisha
Hacioglu Kadri
Venkatesan Shankar
Vyas Saurabh
Stolcke Andreas
Abstract

Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxiliary data. Filtering based on multi-model consensus or named entity recognition (NER) is then applied to select and iteratively refine pseudo-labels, showing slower performance saturation compared to random selection. Evaluated on the multi-domain Wow call center and Fisher English corpora, it outperforms single-step fine-tuning. Consensus-based filtering outperforms other methods, providing up to 22.3% relative improvement on Wow and 24.8% on Fisher over single-step fine-tuning with random selection. NER is the second-best filter, providing competitive performance at a lower computational cost.

Keywords

ASR, incremental semi-supervised learning, pseudo-labels filtering

URL
Published
2025
Pages
3618–3622
Journal
Interspeech, ISSN
Proceedings
Interspeech
Conference
Interspeech Conference
Publisher
Isca-Int Speech Communication Assoc
Place
Rotterdam, The Netherlands
DOI
UT WoS
001613931400148
EID Scopus
BibTeX
@inproceedings{BUT201424,
  author="{} and  {} and  {} and  {} and  {} and  {} and  {} and Petr {Motlíček} and  {} and  {} and  {} and  {} and  {}",
  title="Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering",
  booktitle="Interspeech",
  year="2025",
  journal="Interspeech",
  pages="3618--3622",
  publisher="Isca-Int Speech Communication Assoc",
  address="Rotterdam, The Netherlands",
  doi="10.21437/Interspeech.2025-2601",
  url="https://www.fit.vut.cz/research/group/speech/public/publi/2025/carofilis_interspeech2025_motlicek_co-author.pdf"
}
Files
Projects
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-23-8278, start: 2023-03-01, end: 2026-02-28, completed
Research groups
Departments
Back to top