Result Details

Analysis of X-Vectors for Low-Resource Speech Recognition

KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J.; PROFANT, J.; NYTRA, J.; HLAVÁČEK, M.; PAVLÍČEK, T. Analysis of X-Vectors for Low-Resource Speech Recognition. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021. p. 6998-7002. ISBN: 978-1-7281-7605-5.

Type

conference paper

Language

English

Authors

Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Veselý Karel, Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)
Profant Ján, Ing.
Nytra Jiří, Bc.
HLAVÁČEK, M.
Pavlíček Tomáš, Ing.

Abstract

The paper presents a study of usability of x-vectors for adaptationof automatic speech recognition (ASR) systems. Xvectorsare Neural Network (NN)-based speaker embeddingsrecently proposed in speaker recognition (SR). They quicklyreplaced common i-vectors and became new state-of-the-arttechnique. Here, the same approach is adopted for ASR withthe hope of similar outcome. All experiments were done onASR for the latest IARPA MATERIAL evaluation running onPashto language. Over 1% absolute improvement was observedwith x-vectors over traditional i-vectors, even whenthe x-vector extractor was not trained on target Pashto data.

Keywords

speech recognition, adaptation, x-vectors,data augmentation, robustness

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2021/karafiat… PDF

Published

2021

Pages

6998–7002

Proceedings

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conference

2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

ISBN

978-1-7281-7605-5

Publisher

IEEE Signal Processing Society

Place

Toronto, Ontario

DOI

10.1109/ICASSP39728.2021.9414725

UT WoS

000704288407055

EID Scopus

2-s2.0-85115057300

BibTeX

@inproceedings{BUT175794,
  author="KARAFIÁT, M. and VESELÝ, K. and ČERNOCKÝ, J. and PROFANT, J. and NYTRA, J. and HLAVÁČEK, M. and PAVLÍČEK, T.",
  title="Analysis of X-Vectors for Low-Resource Speech Recognition",
  booktitle="ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
  year="2021",
  pages="6998--7002",
  publisher="IEEE Signal Processing Society",
  address="Toronto, Ontario",
  doi="10.1109/ICASSP39728.2021.9414725",
  isbn="978-1-7281-7605-5",
  url="https://www.fit.vut.cz/research/publication/12525/"
}

Files

pdf karafiat_icassp2021_09414725.pdf 2 MB

Projects

IARPA Machine Translation for English Retrieval of Information in Any Language (MATERIAL) - Foreign Language Automated Information Retrieval (FLAIR), IARPA, start: 2017-09-21, end: 2021-10-22, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Real time network, text, and speaker analytics for combating organized crime, EU, Horizon 2020, start: 2019-09-01, end: 2022-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)