Detail výsledku

Effective Phase Encoding for End-To-End Speaker Verification

PENG, J.; QU, X.; GU, R.; WANG, J.; XIAO, J.; BURGET, L.; ČERNOCKÝ, J. Effective Phase Encoding for End-To-End Speaker Verification. In Proceedings Interspeech 2021. Proceedings of Interspeech. Brno: International Speech Communication Association, 2021. no. 8, p. 2366-2370. ISSN: 1990-9772.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Peng Junyi
QU, X.
GU, R.
WANG, J.
XIAO, J.
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Abstrakt

The widely used magnitude spectrum based features have shown their superiority in the field of speech processing. Incontrast, the importance of phase spectrum is always ignored.This is because the patterns hidden in phase cannot be intuitivelymodelled and interpreted, due to phase wrapping phenomenon.In this paper, we explore novel phase spectrum basedfeatures, named Learnable Group Delay (LearnGD), to captureuseful information in speech signals. Specifically, firstly, thenegative of the spectral derivative of the phase spectrum, calledgroup delay (GD), is used to unwrap the phase. Then, to suppressthe spiky nature of GD, which is caused by its roots closeto the unit circle in the Z domain, a carefully designed light convolutionalsmoothing layer is employed to reconstruct the GD.Finally, an exponential hyper-parameter is introduced to reconstructGD features to restore the spectrum range and generateLearnGD features. For performance evaluation, speaker verificationexperiments are conducted on the VoxCeleb2 corpus.Compared to the traditional acoustic feature derived from themagnitude spectrum, the proposed phase-based features reacha 27.8% relative improvement in terms of EER. Furthermore,experimental results on TIMIT phoneme recognition task alsodemonstrate the effectiveness of our proposed phase-based features.

Klíčová slova

end-to-end speaker verification, phase information,group delay, on-the-fly

URL
Rok
2021
Strany
2366–2370
Časopis
Proceedings of Interspeech, roč. 2021, č. 8, ISSN 1990-9772
Sborník
Proceedings Interspeech 2021
Konference
Interspeech Conference
Vydavatel
International Speech Communication Association
Místo
Brno
DOI
UT WoS
000841879502096
EID Scopus
BibTeX
@inproceedings{BUT175842,
  author="PENG, J. and QU, X. and GU, R. and WANG, J. and XIAO, J. and BURGET, L. and ČERNOCKÝ, J.",
  title="Effective Phase Encoding for End-To-End Speaker Verification",
  booktitle="Proceedings Interspeech 2021",
  year="2021",
  journal="Proceedings of Interspeech",
  volume="2021",
  number="8",
  pages="2366--2370",
  publisher="International Speech Communication Association",
  address="Brno",
  doi="10.21437/Interspeech.2021-2025",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/interspeech_2021/peng21c_interspeech.html"
}
Soubory
Projekty
Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Pracoviště
Nahoru