Result Details

Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation

ALAM, J.; BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022. p. 346-353.

Type

conference paper

Language

English

Authors

Alam Jahangir
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Mošner Ladislav, Ing., DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), DCGM (FIT)
Silnova Anna, M.Sc., Ph.D., DCGM (FIT)
Stafylakis Themos
and others

Abstract

In this contribution, we provide a description of the ABC teamscollaborative efforts toward the development of speaker verificationsystems for the NIST Speaker Recognition Evaluation 2021 (NISTSRE2021).Cross-lingual and cross-dataset trials are the two mainchallenges introduced in the NIST-SRE2021. Submissions of ABCteam are the result of active collaboration of researchers from BUT,CRIM, Omilia and Innovatrics. We took part in all three close conditiontracks for audio-only, audio-visual and visual-only verificationtasks. Our audio-only systems follow deep speaker embeddings(e.g., x-vectors) with a subsequent PLDA scoring paradigm. As embeddingsextractor, we select some variants of residual neural network(ResNet), factored time delay neural network (FTDNN) andHybrid Neural Network (HNN) architectures. The HNN embeddingsextractor employs CNN, LSTM and TDNN networks and incorporatesa multi-level global-local statistics pooling method in orderto aggregate the speaker information within short time-span andutterance-level context. Our visual-only systems are based on pretrainedembeddings extractors employing some variants of ResNetand the scoring is based on cosine distance. When developing anaudio-visual system, we simply fuse the outputs of independent audioand visual systems. Our final submitted systems are obtainedby performing score level fusion of subsystems followed by scorecalibration.

Keywords

speaker verification, recognition, evaluation

URL

Published

2022

Pages

346–353

Proceedings

Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022)

Conference

Odyssey 2022: The Speaker and Language Recognition Workshop

Publisher

International Speech Communication Association

Place

Beijing

DOI

10.21437/Odyssey.2022-48

BibTeX

@inproceedings{BUT179689,
  author="Jahangir {Alam} and Lukáš {Burget} and Ondřej {Glembek} and Pavel {Matějka} and Ladislav {Mošner} and Oldřich {Plchot} and Johan Andréas {Rohdin} and Anna {Silnova} and Themos {Stafylakis}",
  title="Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation",
  booktitle="Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022)",
  year="2022",
  pages="346--353",
  publisher="International Speech Communication Association",
  address="Beijing",
  doi="10.21437/Odyssey.2022-48",
  url="https://www.isca-speech.org/archive/pdfs/odyssey_2022/alam22_odyssey.pdf"
}

Files

pdf alam_odyssey2022_development.pdf 757 kB

Projects

Exchanges for SPEech ReseArch aNd TechnOlogies, EU, Horizon 2020, start: 2021-01-01, end: 2025-12-31, running
Multi-linguality in speech technologies, MŠMT, INTER-EXCELLENCE - Podprogram INTER-ACTION, LTAIN19087, start: 2020-01-01, end: 2023-08-31, completed
Neural Representations in multi-modal and multi-lingual modeling, GACR, Grantové projekty exelence v základním výzkumu EXPRO - 2019, GX19-26934X, start: 2019-01-01, end: 2023-12-31, completed
Real time network, text, and speaker analytics for combating organized crime, EU, Horizon 2020, start: 2019-09-01, end: 2022-12-31, completed
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)