Result Details

Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint

PRASAD, A.; CAROFILIS, A.; VANDERREYDT, G.; KHALIL, D.; MADIKERI, S.; MOTLÍČEK, P.; SCHUEPBACH, C. Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11921-11925. ISBN: 979-8-3503-4485-1.
Type
conference paper
Language
English
Authors
Prasad Amrutha
CAROFILIS, A.
VANDERREYDT, G.
KHALIL, D.
Madikeri Srikanth, FIT (FIT)
Motlíček Petr, doc. Ing., Ph.D., DCGM (FIT)
SCHUEPBACH, C.
Abstract

Self-supervised models trained with high linguistic diversity,
such as the XLS-R model, can be effectively fine-tuned for
the language recognition task. Typically, a back-end classifier
followed by statistics pooling layer are added during train-
ing. Commonly used back-end classifiers require a large num-
ber of parameters to be trained, which is not ideal in limited
data conditions. In this work, we explore smaller parame-
ter back-ends using factorized Time Delay Neural Network
(TDNN-F). The TDNN-F architecture is also integrated into
Emphasized Channel Attention, Propagation and Aggregation-
TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F,
reducing the number of parameters by 30 to 50% absolute,
with competitive accuracies and no change in minimum cost.
The results show that the ECAPA-TDNN-F can be extended
to tasks where ECAPA-TDNN is suitable. We also test the
effectiveness of a linear classifier and a variant, the Orthonor-
mal linear classifier, previously used in x-vector type systems.
The models are trained with NIST LRE17 data and evalu-
ated on NIST LRE17, LRE22 and the ATCO2 LID datasets.
Both linear classifiers outperform conventional back-ends with
improvements in accuracy between 0.9% and 9.1%

Keywords

Language Identification, Transformers, Wav2Vec2, fine-tuning, low-resource, out-of-domain,

URL
Published
2024
Pages
11921–11925
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE
ISBN
979-8-3503-4485-1
Publisher
IEEE Signal Processing Society
Place
Seoul
DOI
EID Scopus
BibTeX
@inproceedings{BUT193354,
  author="PRASAD, A. and CAROFILIS, A. and VANDERREYDT, G. and KHALIL, D. and MADIKERI, S. and MOTLÍČEK, P. and SCHUEPBACH, C.",
  title="Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2024",
  pages="11921--11925",
  publisher="IEEE Signal Processing Society",
  address="Seoul",
  doi="10.1109/ICASSP48485.2024.10446751",
  isbn="979-8-3503-4485-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751"
}
Files
Projects
Soudobé metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-23-8278, start: 2023-03-01, end: 2026-02-28, running
Research groups
Departments
Back to top