Detail výsledku
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Mošner Ladislav, Ing., UPGM (FIT)
KAKOUROS, S.
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Self-supervised learning of speech representations from large
amounts of unlabeled data has enabled state-of-the-art results
in several speech processing tasks. Aggregating these speech
representations across time is typically approached by using
descriptive statistics, and in particular, using the first- and
second-order statistics of representation coefficients. In this
paper, we examine an alternative way of extracting speaker
and emotion information from self-supervised trained models,
based on the correlations between the coefficients of the
representations - correlation pooling. We show improvements
over mean pooling and further gains when the pooling
methods are combined via fusion. The code is available at
github.com/Lamomal/s3prl_correlation.
Speaker identification, speaker verification, emotion recognition, self-supervised models
@inproceedings{BUT185160,
author="STAFYLAKIS, T. and MOŠNER, L. and KAKOUROS, S. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
title="Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations",
booktitle="2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings",
year="2023",
pages="1136--1143",
publisher="IEEE Signal Processing Society",
address="Doha",
doi="10.1109/SLT54892.2023.10023345",
isbn="978-1-6654-7189-3",
url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10023345"
}
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Robustní zpracování nahrávek pro operativu a bezpečnost, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, zahájení: 2020-10-01, ukončení: 2025-09-30, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení