Detail výsledku

Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization

HAN, J.; LANDINI, F.; ROHDIN, J.; SILNOVA, A.; DIEZ, M.; ČERNOCKÝ, J.; BURGET, L. Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. Interspeech. Rotterdam, The Netherlands: International Speech Communication Association, 2025. p. 1583-1587.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Han Jiangyu, UPGM (FIT)
Landini Federico Nicolás, Ph.D.
Rohdin Johan Andréas, M.Sc., Ph.D., FIT (FIT), UPGM (FIT)
Silnova Anna, M.Sc., Ph.D., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)

Abstrakt

Self-supervised learning (SSL) models like WavLM can be effectively utilized when building speaker diarization systems but are often large and slow, limiting their use in resource-constrained scenarios. Previous studies have explored compression techniques, but usually for the price of degraded performance at high pruning ratios. In this work, we propose to compress SSL models through structured pruning by introducing knowledge distillation. Different from the existing works, we emphasize the importance of fine-tuning SSL models before pruning. Experiments on far-field single-channel AMI, AISHELL-4, and AliMeeting datasets show that our method can remove redundant parameters of WavLM Base+ and WavLM Large by up to 80% without any performance degradation. After pruning, the inference speeds on a single GPU for the Base+ and Large models are 4.0 and 2.6 times faster, respectively. Our source code is publicly available.

Klíčová slova

URL

https://www.isca-archive.org/interspeech_2025/han25_interspeech.pdf

Rok

2025

Strany

1583–1587

Časopis

Interspeech, ISSN

Sborník

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

Konference

Interspeech

Vydavatel

International Speech Communication Association

Místo

Rotterdam, The Netherlands

DOI

10.21437/Interspeech.2025-484

EID Scopus

2-s2.0-105020070551

BibTeX

@inproceedings{BUT199389,
  author="Jiangyu {Han} and Federico Nicolás {Landini} and Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Jan {Černocký} and Lukáš {Burget}",
  title="Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association Interspeech",
  year="2025",
  journal="Interspeech",
  pages="1583--1587",
  publisher="International Speech Communication Association",
  address="Rotterdam, The Netherlands",
  doi="10.21437/Interspeech.2025-484",
  url="https://www.isca-archive.org/interspeech_2025/han25_interspeech.pdf"
}

Projekty

Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím, EU, MEZISEKTOROVÁ SPOLUPRÁCE, EH23_020/0008518, zahájení: 2025-01-01, ukončení: 2028-12-31, řešení
Vylepšování robustních a kreativních technologií lidského jazyka prostřednictvím akcí a výzkumu CHallenge, EU, European Defence Fund, zahájení: 2024-12-01, ukončení: 2029-11-30, řešení
Výměny pro výzkum řeči a technologií, EU, Horizon 2020, zahájení: 2021-01-01, ukončení: 2025-12-31, řešení

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)