Detail výsledku

Learnable Sparse Filterbank for Speaker Verification

PENG, J.; GU, R.; MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Learnable Sparse Filterbank for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022. no. 9, p. 5110-5114. ISSN: 1990-9772.

Typ

článek ve sborníku konference

Jazyk

angličtina

Autoři

Peng Junyi, UPGM (FIT)
GU, R.
Mošner Ladislav, Ing., UPGM (FIT)
Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

Recently, feature extraction with learnable filters was extensively
investigated with speaker verification systems, with filters
learned both in time- and frequency-domains. Most of the
learned schemes however end up with filters close to their initialization
(e.g. Mel filterbank) or filters strongly limited by
their constraints. In this paper, we propose a novel learnable
sparse filterbank, named LearnSF, by exclusively optimizing
the sparsity of the filterbank, that does not explicitly constrain
the filters to follow pre-defined distribution. After standard
pre-processing (STFT and square of the magnitude spectrum),
the learnable sparse filterbank is employed, with its normalized
outputs fed into a neural network predicting the speaker identity.
We evaluated the performance of the proposed approach
on both VoxCeleb and CNCeleb datasets. The experimental
results demonstrate the effectiveness of the proposed LearnSF
compared to both widely-used acoustic features and existing parameterized
learnable front-ends.

Klíčová slova

learnable filter, sparse filtering, sparsity, speaker verification

URL

Rok

2022

Strany

5110–5114

Časopis

Proceedings of Interspeech, č. 9, ISSN 1990-9772

Sborník

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Konference

Interspeech Conference

Vydavatel

International Speech Communication Association

Místo

Incheon

DOI

10.21437/Interspeech.2022-11309

UT WoS

000900724505058

EID Scopus

2-s2.0-85140077879

BibTeX

@inproceedings{BUT179826,
  author="PENG, J. and GU, R. and MOŠNER, L. and PLCHOT, O. and BURGET, L. and ČERNOCKÝ, J.",
  title="Learnable Sparse Filterbank for Speaker Verification",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2022",
  journal="Proceedings of Interspeech",
  number="9",
  pages="5110--5114",
  publisher="International Speech Communication Association",
  address="Incheon",
  doi="10.21437/Interspeech.2022-11309",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2022/peng22e_interspeech.pdf"
}

Soubory

pdf peng22e_interspeech2022_learnable.pdf 3 MB

Projekty

Multi-lingualita v řečových technologiích, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, zahájení: 2020-01-01, ukončení: 2023-08-31, ukončen
Neuronové reprezentace v multimodálním a mnohojazyčném modelování, GAČR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, zahájení: 2019-01-01, ukončení: 2023-12-31, ukončen
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)