Publication Details

Application of speaker- and language identification state-of-the-art techniques for emotion recognition

KOCKMANN, M.; BURGET, L.; ČERNOCKÝ, J. Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Communication, 2011, vol. 53, no. 9, p. 1172-1185. ISSN: 0167-6393.

Czech title

Použití aktuálních technik pro identifikaci řečníka a jazyka v rozpoznávání emocí

Type

journal article

Language

English

Authors

Kockmann Marcel, Dipl.-Ing., Ph.D.
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

Emotion recognition; Gaussian mixture models; Maximum-mutual-information; Intersession variability compensation; Score-level fusion

Abstract

Authors of this article show that feature extraction and statistical modeling methods that are usually used in speaker and language recognition can be successfully used for emotion recognitionas well.

Annotation

This article describes our efforts of transferring feature extraction and statistical modeling techniques from the fields of speaker and language identification to the related field of emotion recognition. We give detailed insight to our acoustic and prosodic feature extraction and show how to apply Gaussian Mixture Modeling techniques on top of it. We focus on different flavors of Gaussian Mixture Models (GMMs), including more sophisticated approaches like discriminative training using Maximum-Mutual-Information (MMI) criterion and InterSession Variability (ISV) compensation. Both techniques show superior performance in language and speaker identification. Furthermore, we combine multiple system outputs by score-level fusion to exploit the complementary information in diverse systems. Our proposal is evaluated with several experiments on the FAU Aibo Emotion Corpus containing non-acted spontaneous emotional speech. Within the Interspeech 2009 Emotion Challenge we could achieve the best results for the 5-class task of the Open Performance Sub-Challenge with an unweighted average recall of 41.7%. Further additional experiments on the acted Berlin Database of Emotional Speech show the capability of intersession variability compensation for emotion recognition.

Published

2011

Pages

1172–1185

Journal

Speech Communication, vol. 53, no. 9, ISSN 0167-6393

Book

Speech Communication

Publisher

Elsevier Science

DOI

10.1016/j.specom.2011.01.007

UT WoS

000294104000009

EID Scopus

2-s2.0-79960848738

BibTeX

@article{BUT76396,
  author="Marcel {Kockmann} and Lukáš {Burget} and Jan {Černocký}",
  title="Application of speaker- and language identification state-of-the-art techniques for emotion recognition",
  journal="Speech Communication",
  year="2011",
  volume="53",
  number="9",
  pages="1172--1185",
  doi="10.1016/j.specom.2011.01.007",
  issn="0167-6393",
  url="http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271578&_user=640830&_pii=S0167639311000082&_check=y&_origin=search&_zone=rslt_list_item&_coverDate=2011-12-31&wchp=dGLbVlS-zSkWz&md5=2a79c3d171cd13a3689408115666e2ef/1-s2.0-S0167639311000082-main"
}