Detail výsledku

Advancing speaker embedding learning: Wespeaker toolkit for research and production

WANG, S.; CHEN, Z.; HAN, B.; WANG, H.; XIANG, X.; ROHDIN, J.; SILNOVA, A.; QIAN, Y.; LI, H. Advancing speaker embedding learning: Wespeaker toolkit for research and production. SPEECH COMMUNICATION, 2024, vol. 162, no. 103104, p. 1-12. ISSN: 0167-6393.

Typ

článek v časopise

Jazyk

anglicky

Autoři

Wang Shuai
CHEN, Z.
HAN, B.
WANG, H.
XIANG, X.
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), UPGM (FIT)
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Qian Yanmin
Li Haizhou
a další

Abstrakt

Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

Klíčová slova

Wespeaker; Speaker embedding learning; SSL; Open-source

URL

Rok

2024

Strany

1–12

Časopis

SPEECH COMMUNICATION, roč. 162, č. 103104, ISSN 0167-6393

DOI

10.1016/j.specom.2024.103104

UT WoS

001279201500001

EID Scopus

2-s2.0-85199203394

BibTeX

@article{BUT193986,
  author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
  title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
  journal="SPEECH COMMUNICATION",
  year="2024",
  volume="162",
  number="103104",
  pages="1--12",
  doi="10.1016/j.specom.2024.103104",
  issn="0167-6393",
  url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}

Soubory

pdf wang_speech communication_2024.pdf 2 MB

Projekty

Nástroje boje proti hlasovým DeepFakes, MV, Programu bezpečnostního výzkumu ČR 2021-2026: vývoj, testování a evaluace nových bezpečnostních technologií (SECTECH) - II. veřejná soutěž, VB02000060, zahájení: 2024-01-01, ukončení: 2026-12-31, řešení
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)