Result Details

Investigation of Specaugment for Deep Speaker Embedding Learning

WANG, S.; ROHDIN, J.; PLCHOT, O.; BURGET, L.; YU, K.; ČERNOCKÝ, J. Investigation of Specaugment for Deep Speaker Embedding Learning. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Barcelona: IEEE Signal Processing Society, 2020. p. 7139-7143. ISBN: 978-1-5090-6631-5.
Type
conference paper
Language
English
Authors
Abstract

SpecAugment is a newly proposed data augmentation method for speech recognition. By randomly masking bands in the log Mel spectogram this method leads to impressive performance improvements. In this paper, we investigate the usage of SpecAugment for speaker verification tasks. Two different models, namely 1-D convolutional TDNN and 2-D convolutional ResNet34, trained with either Softmax or AAM-Softmax loss, are used to analyze SpecAugments effectiveness. Experiments are carried out on the Voxceleb and NIST SRE 2016 dataset. By applying SpecAugment to the original clean data in an on-the-fly manner without complex off-line data augmentation methods, we obtained 3.72% and 11.49% EER for NIST SRE 2016 Cantonese and Tagalog, respectively. For Voxceleb1 evaluation set, we obtained 1.47% EER.

Keywords

speaker embedding, on-the-fly data augmentation, speaker verification, specaugment

URL
Published
2020
Pages
7139–7143
Proceedings
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
ISBN
978-1-5090-6631-5
Publisher
IEEE Signal Processing Society
Place
Barcelona
DOI
UT WoS
000615970407081
EID Scopus
BibTeX
@inproceedings{BUT163947,
  author="WANG, S. and ROHDIN, J. and PLCHOT, O. and BURGET, L. and YU, K. and ČERNOCKÝ, J.",
  title="Investigation of Specaugment for Deep Speaker Embedding Learning",
  booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
  year="2020",
  pages="7139--7143",
  publisher="IEEE Signal Processing Society",
  address="Barcelona",
  doi="10.1109/ICASSP40776.2020.9053481",
  isbn="978-1-5090-6631-5",
  url="https://ieeexplore.ieee.org/document/9053481/authors#authors"
}
Files
Projects
Information mining in speech acquired by distant microphones, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, start: 2015-10-01, end: 2020-09-30, completed
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Moderní metody zpracování, analýzy a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-20-6460, start: 2020-03-01, end: 2023-02-28, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Research groups
Departments
Back to top