Detail výsledku

BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge

KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022. p. 276-280.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Kocour Martin, Ing., UPGM (FIT)
Umesh Jahnavi
Karafiát Martin, Ing., Ph.D., UPGM (FIT)
Švec Ján, Ing., UPGM (FIT)
Lopez Fernando
Beneš Karel, Ing., Ph.D., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Szőke Igor, Ing., Ph.D., UPGM (FIT)
Luque Jordi, FIT (FIT)
Veselý Karel, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

Research
on the development of Automatic Speech Recognition
systems for the Albayzin 2022 Challenge. We train and evaluate
both hybrid systems and those based on end-to-end models.
We also investigate the use of self-supervised learning speech
representations from pre-trained models and their impact on
ASR performance (as opposed to training models directly from
scratch). Additionally, we also apply the Whisper model in a
zero-shot fashion, postprocessing its output to fit the required
transcription format. On top of tuning the model architectures
and overall training schemes, we improve the robustness of our
models by augmenting the training data with noises extracted
from the target domain. Moreover, we apply rescoring with
an external LM on top of N-best hypotheses to adjust each
sentence score and pick the single best hypothesis. All these
efforts lead to a significant WER reduction. Our single best
system and the fusion of selected systems achieved 16.3% and
13.7% WER respectively on RTVE2020 test partition, i.e. the
official evaluation partition from the previous Albayzin challenge.

Klíčová slova

ASR fusion, end-to-end model, self-supervised
learning, automatic speech recognition.

URL
Rok
2022
Strany
276–280
Sborník
Proceedings of IberSpeech 2022
Konference
IberSPEECH 2022 Conference
Vydavatel
International Speech Communication Association
Místo
Granada
DOI
BibTeX
@inproceedings{BUT180167,
  author="Martin {Kocour} and Jahnavi {Umesh} and Martin {Karafiát} and Ján {Švec} and Fernando {Lopez} and Karel {Beneš} and Mireia {Diez Sánchez} and Igor {Szőke} and Jordi {Luque} and Karel {Veselý} and Lukáš {Burget} and Jan {Černocký}",
  title="BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge",
  booktitle="Proceedings of IberSpeech 2022",
  year="2022",
  pages="276--280",
  publisher="International Speech Communication Association",
  address="Granada",
  doi="10.21437/IberSPEECH.2022-56",
  url="https://www.isca-speech.org/archive/pdfs/iberspeech_2022/kocour22_iberspeech.pdf"
}
Soubory
Projekty
Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení
Výzkumné skupiny
Pracoviště
Nahoru