Publication Details

Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing

KAKOUROS Sofoklis, STAFYLAKIS Themos, MOŠNER Ladislav and BURGET Lukáš. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In: Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Available from: https://ieeexplore.ieee.org/document/10094673
Czech title
Rozpoznávání emocí z řeči pomocí samoučících modelů s využitím attention korelací mezi kanály a vyhlazování značek
Type
conference paper
Language
english
Authors
Kakouros Sofoklis ( unknown)
Stafylakis Themos (OMILIA)
Mošner Ladislav, Ing. (DCGM FIT BUT)
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT)
URL
Keywords

emotion recognition, self-supervised features, iemocap, hubert, wavlm, wav2vec 2.0

Abstract

When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion- relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels. Self-supervised pre-trained representations can robustly cap- ture information from speech enabling state-of-the-art results in many downstream tasks including emotion recognition. However, better ways of aggregating the information across time need to be considered as the relevant emotion informa- tion is likely to appear piecewise and not uniformly across the signal. For the labels, we need to take into account that there is a substantial degree of noise that comes from the subjective human annotations. In this paper, we propose a novel approach to attentive pooling based on correlations be- tween the representations' coefficients combined with label smoothing, a method aiming to reduce the confidence of the classifier on the training labels. We evaluate our proposed approach on the benchmark dataset IEMOCAP, and demon- strate high performance surpassing that in the literature. The code to reproduce the results is available at github.com/ skakouros/s3prl_attentive_correlation.

Published
2023
Pages
1-5
Proceedings
Proceedings of ICASSP 2023
Conference
2023 IEEE International Conference on Acoustics, Speech and Signal Processing IEEE, Rhodes Island, Greece, GR
ISBN
978-1-7281-6327-7
Publisher
IEEE Signal Processing Society
Place
Rhodes Island, GR
DOI
BibTeX
@INPROCEEDINGS{FITPUB13054,
   author = "Sofoklis Kakouros and Themos Stafylakis and Ladislav Mo\v{s}ner and Luk\'{a}\v{s} Burget",
   title = "Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing",
   pages = "1--5",
   booktitle = "Proceedings of ICASSP 2023",
   year = 2023,
   location = "Rhodes Island, GR",
   publisher = "IEEE Signal Processing Society",
   ISBN = "978-1-7281-6327-7",
   doi = "10.1109/ICASSP49357.2023.10094673",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13054"
}
Back to top