Result Details

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

ČEGIŇ, J.; PECHER, B.; ŠIMKO, J.; SRBA, I.; BIELIKOVÁ, M.; BRUSILOVSKY, P. Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation. Suzhou, China: Association for Computational Linguistics, 2025. p. 5533-5550. ISBN: 979-8-89176-335-7.

Type

conference paper

Language

English

Authors

Čegiň Ján, Ing., DCGM (FIT)
Pecher Branislav, Ing., Ph.D.
Šimko Jakub, doc. Ing., PhD., DCGM (FIT)
Srba Ivan
Bieliková Mária, prof. Ing., Ph.D., DCGM (FIT)
Brusilovsky Peter

Abstract

The generative large language models (LLMs) are increasingly used for data
augmentation tasks, where text samples are paraphrased (or generated anew) and
then used for classifier fine-tuning. Existing works on augmentation leverage the
few-shot scenarios, where samples are given to LLMs as part of prompts, leading
to better augmentations. Yet, the samples are mostly selected randomly and
a comprehensive overview of the effects of other (more 'informed') sample
selection strategies is lacking. In this work, we compare sample selection
strategies existing in few-shot learning literature and investigate their effects
in LLM-based textual augmentation. We evaluate this on in-distribution and
out-of-distribution classifier performance. Results indicate, that while some
'informed' selection strategies increase the performance of models, especially
for out-of-distribution data, it happens only seldom and with marginal
performance increases. Unless further advances are made, a default of random
sample selection remains a good option for augmentation practitioners.

Keywords

data augmentation, analysis

URL

https://aclanthology.org/2025.findings-emnlp.296/

Published

2025

Pages

5533–5550

Conference

Conference on Empirical Methods in Natural Language Processing

ISBN

979-8-89176-335-7

Publisher

Association for Computational Linguistics

Place

Suzhou, China

DOI

10.18653/v1/2025.findings-emnlp.296

BibTeX

@inproceedings{BUT193746,
  author="Ján {Čegiň} and Branislav {Pecher} and Jakub {Šimko} and  {} and Mária {Bieliková} and  {}",
  title="Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation",
  year="2025",
  pages="5533--5550",
  publisher="Association for Computational Linguistics",
  address="Suzhou, China",
  doi="10.18653/v1/2025.findings-emnlp.296",
  isbn="979-8-89176-335-7",
  url="https://aclanthology.org/2025.findings-emnlp.296/"
}

Departments

Department of Computer Graphics and Multimedia (DCGM)