Result Details

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023. p. 1136-1143. ISBN: 978-1-6654-7189-3.
Type
conference paper
Language
English
Authors
Abstract

Self-supervised learning of speech representations from large
amounts of unlabeled data has enabled state-of-the-art results
in several speech processing tasks. Aggregating these speech
representations across time is typically approached by using
descriptive statistics, and in particular, using the first- and
second-order statistics of representation coefficients. In this
paper, we examine an alternative way of extracting speaker
and emotion information from self-supervised trained models,
based on the correlations between the coefficients of the
representations - correlation pooling. We show improvements
over mean pooling and further gains when the pooling
methods are combined via fusion. The code is available at
github.com/Lamomal/s3prl_correlation.

Keywords

Speaker identification, speaker verification, emotion recognition, self-supervised models

URL
Published
2023
Pages
1136–1143
Proceedings
2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
Conference
IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT
ISBN
978-1-6654-7189-3
Publisher
IEEE Signal Processing Society
Place
Doha
DOI
UT WoS
000968851900153
EID Scopus
BibTeX
@inproceedings{BUT185160,
  author="STAFYLAKIS, T. and MOŠNER, L. and KAKOUROS, S. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
  title="Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations",
  booktitle="2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",
  year="2023",
  pages="1136--1143",
  publisher="IEEE Signal Processing Society",
  address="Doha",
  doi="10.1109/SLT54892.2023.10023345",
  isbn="978-1-6654-7189-3",
  url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345"
}
Files
Projects
Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, completed
Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, start: 2020-01-01, end: 2023-08-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Research groups
Departments
Back to top