Detail výsledku

Speaker embeddings by modeling channel-wise correlations

STAFYLAKIS, T.; ROHDIN, J.; BURGET, L. Speaker embeddings by modeling channel-wise correlations. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. no. 8, p. 501-505. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Stafylakis Themos
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Abstrakt

Speaker embeddings extracted with deep 2D convolutional neuralnetworks are typically modeled as projections of first andsecond order statistics of channel-frequency pairs onto a linearlayer, using either average or attentive pooling along the timeaxis. In this paper we examine an alternative pooling method,where pairwise correlations between channels for given frequenciesare used as statistics. The method is inspired bystyle-transfer methods in computer vision, where the style ofan image, modeled by the matrix of channel-wise correlations,is transferred to another image, in order to produce a new imagehaving the style of the first and the content of the second.By drawing analogies between image style and speaker characteristics,and between image content and phonetic sequence,we explore the use of such channel-wise correlations featuresto train a ResNet architecture in an end-to-end fashion. Ourexperiments on VoxCeleb demonstrate the effectiveness of theproposed pooling method in speaker recognition.

Klíčová slova

speaker recognition, style-transfer, deep learning

URL
Rok
2021
Strany
501–505
Časopis
Proceedings of Interspeech, roč. 2021, č. 8, ISSN 1990-9772
Sborník
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Brno
DOI
UT WoS
000841879500101
EID Scopus
BibTeX
@inproceedings{BUT175834,
  author="Themos {Stafylakis} and Johan Andréas {Rohdin} and Lukáš {Burget}",
  title="Speaker embeddings by modeling channel-wise correlations",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2021",
  journal="Proceedings of Interspeech",
  volume="2021",
  number="8",
  pages="501--505",
  publisher="International Speech Communication Association",
  address="Brno",
  doi="10.21437/Interspeech.2021-1442",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/interspeech_2021/stafylakis21_interspeech.html"
}
Soubory
Projekty
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Síťová, textová analýza a analýza řeči v reálném čase pro boj s organizovaným zločinem, EU, Horizon 2020, zahájení: 2019-09-01, ukončení: 2022-12-31, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení
Výzkumné skupiny
Pracoviště
Nahoru