Publication Details

Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint

PRASAD, A.; CAROFILIS, A.; VANDERREYDT, G.; KHALIL, D.; MADIKERI, S.; MOTLÍČEK, P.; SCHUEPBACH, C. Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024. p. 11921-11925. ISBN: 979-8-3503-4485-1.

Czech title

Fine-Tuning samoučicích modelů pro identifikaci jazyka pomocí ortonormálního omezení

Type

conference paper

Language

English

Authors

Prasad Amrutha (DCGM)
CAROFILIS, A.
VANDERREYDT, G.
KHALIL, D.
Madikeri Srikanth
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
SCHUEPBACH, C.

URL

Keywords

Language Identification, Transformers, Wav2Vec2, fine-tuning, low-resource, out-of-domain,

Abstract

Self-supervised models trained with high linguistic diversity,
such as the XLS-R model, can be effectively fine-tuned for
the language recognition task. Typically, a back-end classifier
followed by statistics pooling layer are added during train-
ing. Commonly used back-end classifiers require a large num-
ber of parameters to be trained, which is not ideal in limited
data conditions. In this work, we explore smaller parame-
ter back-ends using factorized Time Delay Neural Network
(TDNN-F). The TDNN-F architecture is also integrated into
Emphasized Channel Attention, Propagation and Aggregation-
TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F,
reducing the number of parameters by 30 to 50% absolute,
with competitive accuracies and no change in minimum cost.
The results show that the ECAPA-TDNN-F can be extended
to tasks where ECAPA-TDNN is suitable. We also test the
effectiveness of a linear classifier and a variant, the Orthonor-
mal linear classifier, previously used in x-vector type systems.
The models are trained with NIST LRE17 data and evalu-
ated on NIST LRE17, LRE22 and the ATCO2 LID datasets.
Both linear classifiers outperform conventional back-ends with
improvements in accuracy between 0.9% and 9.1%

Published

2024

Pages

11921–11925

Proceedings

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Conference

2024 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Seoul, KR

ISBN

979-8-3503-4485-1

Publisher

IEEE Signal Processing Society

Place

Seoul

DOI

10.1109/ICASSP48485.2024.10446751

EID Scopus

2-s2.0-85195416122

BibTeX

@inproceedings{BUT193354,
  author="PRASAD, A. and CAROFILIS, A. and VANDERREYDT, G. and KHALIL, D. and MADIKERI, S. and MOTLÍČEK, P. and SCHUEPBACH, C.",
  title="Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2024",
  pages="11921--11925",
  publisher="IEEE Signal Processing Society",
  address="Seoul",
  doi="10.1109/ICASSP48485.2024.10446751",
  isbn="979-8-3503-4485-1",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751"
}

Files

pdf prasad_icassp2024_fine-tuning.pdf 941 kB