Detail výsledku

BCN2BRNO Automatic speech recognition system for Albayzin 2022 Speech to Text Challenge

Vznik: 2022
Typ
software
Jazyk
anglicky
Autoři
Kocour Martin, Ing., UPGM (FIT)
Umesh Jahnavi
Karafiát Martin, Ing., Ph.D., UPGM (FIT)
Švec Ján, Ing., UPGM (FIT)
Lopez Fernando
Beneš Karel, Ing., Ph.D., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Szőke Igor, Ing., Ph.D., UPGM (FIT)
Luque Jordi, FIT (FIT)
Veselý Karel, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Popis

The software is based on the development of Automatic Speech Recognition systems for the Albayzin 2022 Challenge. We trained and evaluated both hybrid systems and those based on end-to-end models. We also investigated the use of self-supervised learning speech representations from pre-trained models and their impact on ASR performance (as opposed to training models directly from scratch). Additionally, we also applied the Whisper model in a zero-shot fashion, postprocessing its output to fit the required transcription format. On top of tuning the model architectures and overall training schemes, we improved the robustness of our models by augmenting the training data with noises extracted from the target domain. Moreover, we applied rescoring with an external LM on top of N-best hypotheses to adjust each sentence score and pick the single best hypothesis. All these efforts lead to a significant WER reduction. Our single best system and the fusion of selected systems achieved 16.3% and 13.7% WER respectively on RTVE2020 test partition, i.e. the official evaluation partition from the previous Albayzin challenge

Klíčová slova

automatic speech recognition

Umístění
Licence
K využití výsledku jiným subjektem je vždy nutné nabytí licence
Licenční poplatek
Poskytovatel licence na výsledek nepožaduje licenční poplatek
Licenční podmínky

Pro informace o licenčních podmínkách prosím kontaktujte: Ing. Martina Kocmanová, Výzkumné centrum informačních technologií, Fakulta informačních technologií VUT v Brně, Božetěchova 2, 612 66 Brno, tel. 541 141 466.

Projekty
Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen
Výzkumné skupiny
Pracoviště
Nahoru