Result Details
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification
Madikeri Srikanth, FIT (FIT)
SHARMA, B.
KHALIL, D.
KUMAR, S.
NIGMATULINA, I.
Motlíček Petr, doc. Ing., Ph.D., DCGM (FIT)
GANAPATHIRAJU, A.
Spoken Language Understanding (SLU) technologies have
greatly improved due to the effective pretraining of speech
representations. A common requirement of industry-based
solutions is the portability to deploy SLU models in voice-
assistant devices. Thus, distilling knowledge from large text-
based language models has become an attractive solution for
achieving good performance and guaranteeing portability. In
this paper, we introduce a novel architecture that uses a cross-
modal attention mechanism to extract bin-level contextual
embeddings from a word-confusion network (WNC) encod-
ing such that these can be directly compared and aligned with
traditional text-based contextual embeddings. This alignment
is achieved using a recently proposed tokenwise constrastive
loss function. We validate our architecture's effectiveness
by fine-tuning our WCN-based pretrained model to do intent
classification (IC) on the well-known SLURP dataset. Ob-
tained accuracy on the IC task (81%), depicts a 9.4% relative
improvement compared to a recent/equivalent E2E method
Word-Confusion-Networks, Cross-modal Alignment, Knowledge Distillation, Intent Classification
@inproceedings{BUT196786,
author="VILLATORO-TELLO, E. and MADIKERI, S. and SHARMA, B. and KHALIL, D. and KUMAR, S. and NIGMATULINA, I. and MOTLÍČEK, P. and GANAPATHIRAJU, A.",
title="Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification",
booktitle="ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
year="2024",
pages="12617--12621",
publisher="IEEE Signal Processing Society",
address="Seoul",
doi="10.1109/ICASSP48485.2024.10445934",
isbn="979-8-3503-4485-1",
url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10445934"
}