Result Details
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Mošner Ladislav, Ing., DCGM (FIT)
KAKOUROS, S.
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Self-supervised learning of speech representations from large
amounts of unlabeled data has enabled state-of-the-art results
in several speech processing tasks. Aggregating these speech
representations across time is typically approached by using
descriptive statistics, and in particular, using the first- and
second-order statistics of representation coefficients. In this
paper, we examine an alternative way of extracting speaker
and emotion information from self-supervised trained models,
based on the correlations between the coefficients of the
representations - correlation pooling. We show improvements
over mean pooling and further gains when the pooling
methods are combined via fusion. The code is available at
github.com/Lamomal/s3prl_correlation.
Speaker identification, speaker verification, emotion recognition, self-supervised models
@inproceedings{BUT185160,
  author="STAFYLAKIS, T. and MOŠNER, L. and KAKOUROS, S. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
  title="Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations",
  booktitle="2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",
  year="2023",
  pages="1136--1143",
  publisher="IEEE Signal Processing Society",
  address="Doha",
  doi="10.1109/SLT54892.2023.10023345",
  isbn="978-1-6654-7189-3",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345"
}Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, start: 2020-01-01, end: 2023-08-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed