Detail výsledku

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling

CHO, J.; BASKAR, M.; LI, R.; WIESNER, M.; MALLIDI, S.; YALTA, N.; KARAFIÁT, M.; WATANABE, S.; HORI, T. Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. In Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018). Athens: IEEE Signal Processing Society, 2018. p. 521-527. ISBN: 978-1-5386-4334-1.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

CHO, J.
Baskar Murali Karthick, Ing., Ph.D., UPGM (FIT)
Li Ruizhi
Wiesner Matthew, PhD., FIT (FIT)
Mallidi Sri Harish, FIT (FIT)
YALTA, N.
Karafiát Martin, Ing., Ph.D., UPGM (FIT)
Watanabe Shinji, FIT (FIT)
HORI, T.

Abstrakt

Sequence-to-sequence (seq2seq) approach for low-resourceASR is a relatively new direction in speech research. The approachbenefits by performing model training without using lexicon andalignments. However, this poses a new problem of requiring moredata compared to conventional DNN-HMM systems. In this work,we attempt to use data from 10 BABEL languages to build a multilingualseq2seq model as a prior model, and then port them towards4 other BABEL languages using transfer learning approach. We alsoexplore different architectures for improving the prior multilingualseq2seq model. The paper also discusses the effect of integrating arecurrent neural network language model (RNNLM) with a seq2seqmodel during decoding. Experimental results show that the transferlearning approach from the multilingual model shows substantialgains over monolingual models across all 4 BABEL languages.Incorporating an RNNLM also brings significant improvements interms of %WER, and achieves recognition performance comparableto the models trained with twice more training data.

Klíčová slova

Automatic speech recognition (ASR), sequence tosequence, multilingual setup, transfer learning, language modeling

URL

Rok

2018

Strany

521–527

Sborník

Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)

Konference

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)

ISBN

978-1-5386-4334-1

Vydavatel

IEEE Signal Processing Society

Místo

Athens

DOI

10.1109/SLT.2018.8639655

UT WoS

000463141800073

EID Scopus

2-s2.0-85063077624

BibTeX

@inproceedings{BUT163489,
  author="CHO, J. and BASKAR, M. and LI, R. and WIESNER, M. and MALLIDI, S. and YALTA, N. and KARAFIÁT, M. and WATANABE, S. and HORI, T.",
  title="Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling",
  booktitle="Proceedings of 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018)",
  year="2018",
  pages="521--527",
  publisher="IEEE Signal Processing Society",
  address="Athens",
  doi="10.1109/SLT.2018.8639655",
  isbn="978-1-5386-4334-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8639655"
}

Soubory

pdf cho_slt2018_08639655.pdf 345 kB

Projekty

IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)