Publication Details

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

STAFYLAKIS Themos, MOŠNER Ladislav, KAKOUROS Sofoklis, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In: 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023, pp. 1136-1143. ISBN 978-1-6654-7189-3. Available from: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345
Czech title
Extrakce informací o mluvčím a emocích ze self-supervised modelů řeči pomocí korelace po kanálech
Type
conference paper
Language
english
Authors
Stafylakis Themos (OMILIA)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Kakouros Sofoklis ( unknown)
Plchot Oldřich, Ing., Ph.D. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT)
URL
Keywords

Speaker identification, speaker verification, emotion recognition, self-supervised models

Abstract

Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks. Aggregating these speech representations across time is typically approached by using descriptive statistics, and in particular, using the first- and second-order statistics of representation coefficients. In this paper, we examine an alternative way of extracting speaker and emotion information from self-supervised trained models, based on the correlations between the coefficients of the representations - correlation pooling. We show improvements over mean pooling and further gains when the pooling methods are combined via fusion. The code is available at github.com/Lamomal/s3prl_correlation.

Published
2023
Pages
1136-1143
Proceedings
2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
Conference
IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, Doha, QA
ISBN
978-1-6654-7189-3
Publisher
IEEE Signal Processing Society
Place
Doha, QA
DOI
UT WoS
000968851900153
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB12985,
   author = "Themos Stafylakis and Ladislav Mo\v{s}ner and Sofoklis Kakouros and Old\v{r}ich Plchot and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations",
   pages = "1136--1143",
   booktitle = "2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",
   year = 2023,
   location = "Doha, QA",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-6654-7189-3",
   doi = "10.1109/SLT54892.2023.10023345",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12985"
}
Back to top