Detail výsledku

Implementing contextual biasing in GPU decoder for online ASR

NIGMATULINA, I.; MADIKERI, S.; VILLATORO-TELLO, E.; MOTLÍČEK, P.; ZULUAGA-GOMEZ, J.; PANDIA, K.; GANAPATHIRAJU, A. Implementing contextual biasing in GPU decoder for online ASR. In Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023. no. 8, p. 4494-4498. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
NIGMATULINA, I.
Madikeri Srikanth, FIT (FIT)
VILLATORO-TELLO, E.
Motlíček Petr, doc. Ing., Ph.D., UPGM (FIT)
ZULUAGA-GOMEZ, J.
PANDIA, K.
GANAPATHIRAJU, A.
Abstrakt

GPU decoding significantly accelerates the output of ASR predictions.
While GPUs are already being used for online ASR
decoding, post-processing and rescoring on GPUs have not
been properly investigated yet. Rescoring with available contextual
information can considerably improve ASR predictions.
Previous studies have proven the viability of lattice rescoring
in decoding and biasing language model (LM) weights in offline
and online CPU scenarios. In real-time GPU decoding,
partial recognition hypotheses are produced without lattice generation,
which makes the implementation of biasing more complex.
The paper proposes and describes an approach to integrate
contextual biasing in real-time GPU decoding while exploiting
the standard Kaldi GPU decoder. Besides the biasing of partial
ASR predictions, our approach also permits dynamic context
switching allowing a flexible rescoring per each speech segment
directly on GPU. The code is publicly released1 and tested with
open-sourced test sets.

Klíčová slova

real-time speech recognition, contextual adaptation, GPU decoding, finite-state transducers

URL
Rok
2023
Strany
4494–4498
Časopis
Proceedings of Interspeech, roč. 2023, č. 8, ISSN 1990-9772
Sborník
Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Dublin
DOI
EID Scopus
BibTeX
@inproceedings{BUT187754,
  author="NIGMATULINA, I. and MADIKERI, S. and VILLATORO-TELLO, E. and MOTLÍČEK, P. and ZULUAGA-GOMEZ, J. and PANDIA, K. and GANAPATHIRAJU, A.",
  title="Implementing contextual biasing in GPU decoder for online ASR",
  booktitle="Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="8",
  pages="4494--4498",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-2449",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.html"
}
Soubory
Projekty
Síťová, textová analýza a analýza řeči v reálném čase pro boj s organizovaným zločinem, EU, Horizon 2020, zahájení: 2019-09-01, ukončení: 2022-12-31, ukončen
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-23-8278, zahájení: 2023-03-01, ukončení: 2026-02-28, řešení
Výzkumné skupiny
Pracoviště
Nahoru