Result Details

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models

KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 08, p. 2148-2152. ISSN: 1990-9772.

Type

conference paper

Language

English

Authors

Kesiraju Santosh, Ph.D., DCGM (FIT)
Sarvaš Marek, Ing., DCGM (FIT)
Pavlíček Tomáš, Ing.
MACAIRE, C.
CIUBA, A.

Abstract

This paper presents techniques and findings for improving
the performance of low-resource speech to text translation
(ST). We conducted experiments on both simulated and reallow
resource setups, on language pairs English - Portuguese,
and Tamasheq - French respectively. Using the encoder-decoder
framework for ST, our results show that a multilingual automatic
speech recognition system acts as a good initialization
under low-resource scenarios. Furthermore, using the CTC as
an additional objective for translation during training and decoding
helps to reorder the internal representations and improves
the final translation. Through our experiments, we try to
identify various factors (initializations, objectives, and hyperparameters)
that contribute the most for improvements in lowresource
setups. With only 300 hours of pre-training data, our
model achieved 7.3 BLEU score on Tamasheq - French data,
outperforming prior published works from IWSLT 2022 by 1.6
points.

Keywords

speech translation, low-resource, multilingual, speech recognition

URL

Published

2023

Pages

2148–2152

Journal

Proceedings of Interspeech, vol. 2023, no. 08, ISSN 1990-9772

Proceedings

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Interspeech Conference

Publisher

International Speech Communication Association

Place

Dublin

DOI

10.21437/Interspeech.2023-2506

EID Scopus

2-s2.0-85171568999

BibTeX

@inproceedings{BUT185572,
  author="KESIRAJU, S. and SARVAŠ, M. and PAVLÍČEK, T. and MACAIRE, C. and CIUBA, A.",
  title="Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="2148--2152",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-2506",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/kesiraju23_interspeech.pdf"
}

Files

pdf kesiraju23_interspeech2023_strategies.pdf 347 kB

Projects

Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, running
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Practical verification of the possibility of integrating artificial intelligence for receiving emergency calls using a voice chatbot, developed within the research project BV No. VI20192022169, with technology for receiving emergency communications, MV, 1 VS OPSEC, VK01020132, start: 2023-01-06, end: 2025-10-31, running

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)