Detail výsledku

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

MAI, F.; ZULUAGA-GOMEZ, J.; PARCOLLET, T.; MOTLÍČEK, P. HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition. In Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 08, p. 2213-2217. ISSN: 1990-9772.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

MAI, F.
ZULUAGA-GOMEZ, J.
PARCOLLET, T.
Motlíček Petr, doc. Ing., Ph.D., UPGM (FIT)

Abstrakt

State-of-the-art ASR systems have achieved promising results
by modeling local and global interactions separately. While the
former can be computed efficiently, global interactions are usu-
ally modeled via attention mechanisms, which are expensive for
long input sequences. Here, we address this by extending Hy-
perMixer, an efficient alternative to attention exhibiting linear
complexity, to the Conformer architecture for speech recogni-
tion, leading to HyperConformer. In particular, multi-head Hy-
perConformer achieves comparable or higher recognition per-
formance while being more efficient than Conformer in terms of
inference speed, memory, parameter count, and available train-
ing data. HyperConformer achieves a word error rate of 2.9%
on LibriSpeech test-clean with less than 8M neural parameters
and a peak memory during training of 5.7GB, hence trainable
with accessible hardware. Encoder speed is between 38% on
mid-length speech and 56% on long speech faster than an equiv-
alent Conformer.1)

Klíčová slova

Hypernetworks, HyperMixer, Efficient Auto-
matic Speech Recognition, LibriSpeech, SpeechBrain

URL

Rok

2023

Strany

2213–2217

Časopis

Proceedings of Interspeech, roč. 2023, č. 08, ISSN 1990-9772

Sborník

Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH

Konference

Interspeech Conference

Vydavatel

International Speech Communication Association

Místo

Dublin

DOI

10.21437/Interspeech.2023-1611

EID Scopus

2-s2.0-85163412872

BibTeX

@inproceedings{BUT187786,
  author="MAI, F. and ZULUAGA-GOMEZ, J. and PARCOLLET, T. and MOTLÍČEK, P.",
  title="HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition",
  booktitle="Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="2213--2217",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-1611",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2023/mai23_interspeech.pdf"
}

Soubory

pdf mai23_interspeech.pdf 432 kB

Projekty

Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-23-8278, zahájení: 2023-03-01, ukončení: 2026-02-28, řešení

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)