Result Details

ABC SYSTEM DESCRIPTION FOR NIST SRE 2024

ALAM, J.; BARAHONA QUIRÓS, S.; BOBOŠ, D.; BURGET, L.; CUMANI, S.; DAHMANE, M.; HAN, J.; HLAVÁČEK, M.; KODOVSKÝ, M.; LANDINI, F.; MOŠNER, L.; PÁLKA, P.; PAVLÍČEK, T.; PENG, J.; PLCHOT, O.; RAJASEKHAR, P.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T.; ZHANG, L. ABC SYSTEM DESCRIPTION FOR NIST SRE 2024. Proceedings of NIST SRE 2024. San Juan: National Institute of Standards and Technology, 2024. p. 1-9.
Type
conference paper
Language
English
Authors
Alam Jahangir
BARAHONA QUIRÓS, S.
Boboš Dominik, Ing.
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Cumani Sandro, Ph.D.
DAHMANE, M.
Han Jiangyu, DCGM (FIT)
HLAVÁČEK, M.
KODOVSKÝ, M.
Landini Federico Nicolás, Ph.D.
Mošner Ladislav, Ing., DCGM (FIT)
Pálka Petr, Ing., DCGM (FIT)
Pavlíček Tomáš, Ing.
Peng Junyi, DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
RAJASEKHAR, P.
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), DCGM (FIT)
Silnova Anna, M.Sc., Ph.D., DCGM (FIT)
Stafylakis Themos
Zhang Lin, Ph.D., DCGM (FIT)
Abstract

This paper presents the ABC team's submission to the NIST
SRE 2024 evaluation, a collaboration among BUT, Polito, Phonexia,
Omilia, UAM, and CRIM. Our team participated in all evaluation
tracks (audio-only, visual-only, and audio-visual) under both fixed
and open conditions. We developed a variety of frontends, back-
ends, and strategies for calibration and fusion to optimize system
performance.
The fixed and open conditions share some solutions. In the
audio-only systems, we employed ResNet variants and the newly
introduced ReDimNet model as frontends for embedding extraction.
Then, we explored various backends including cosine scoring, Prob-
abilistic Linear Discriminant Analysis, and Pairwise Support Vec-
tor Machine. For the visual-only systems, we adopted the Insight-
face framework, utilized ResNet100 and MagFace pre-trained on the
MS1MV2 dataset. Cosine scoring under various strategies were ap-
plied, with logistic regression used for both calibration and fusion.
Finally, scores from audio-only and visual-only systems were fused
using logistic regression for submission to the audio-visual track.
Building on the fixed condition, the open condition included en-
hancements such as larger ResNet models, additional training data
from the VoxBlink2 dataset, and the pre-trained XLS-R foundation
model

Keywords

NIST, speaker, recognition, evaluation

URL
Published
2024
Pages
1–9
Proceedings
Proceedings of NIST SRE 2024
Conference
2024 NIST Speaker Recognition Evaluation (SRE) Workshop
Publisher
National Institute of Standards and Technology
Place
San Juan
BibTeX
@inproceedings{BUT193961,
  author="ALAM, J. and BARAHONA QUIRÓS, S. and BOBOŠ, D. and BURGET, L. and CUMANI, S. and DAHMANE, M. and HAN, J. and HLAVÁČEK, M. and KODOVSKÝ, M. and LANDINI, F. and MOŠNER, L. and PÁLKA, P. and PAVLÍČEK, T. and PENG, J. and PLCHOT, O. and RAJASEKHAR, P. and ROHDIN, J. and SILNOVA, A. and STAFYLAKIS, T. and ZHANG, L.",
  title="ABC SYSTEM DESCRIPTION FOR NIST SRE 2024",
  booktitle="Proceedings of NIST SRE 2024",
  year="2024",
  pages="1--9",
  publisher="National Institute of Standards and Technology",
  address="San Juan",
  url="https://www.fit.vut.cz/research/publication/13341/"
}
Files
Projects
Robust processing of recordings for operations and security, MV, PROGRAM STRATEGICKÁ PODPORA ROZVOJE BEZPEČNOSTNÍHO VÝZKUMU ČR 2019-2025 (IMPAKT 1) PODPROGRAMU 1 SPOLEČNÉ VÝZKUMNÉ PROJEKTY (BV IMP1/1VS), VJ01010108, start: 2020-10-01, end: 2025-09-30, completed
Tools To Combat Voice DeepFakes, MV, Programu bezpečnostního výzkumu ČR 2021-2026: vývoj, testování a evaluace nových bezpečnostních technologií (SECTECH) - II. veřejná soutěž, VB02000060, start: 2024-01-01, end: 2026-12-31, running
Research groups
Departments
Back to top