Result Details
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation
Pecher Branislav, Ing., Ph.D.
Šimko Jakub, doc. Ing., PhD., DCGM (FIT)
Srba Ivan
Bieliková Mária, prof. Ing., Ph.D., DCGM (FIT)
Brusilovsky Peter
The generative large language models (LLMs) are increasingly used for data
augmentation tasks, where text samples are paraphrased (or generated anew) and
then used for classifier fine-tuning. Existing works on augmentation leverage the
few-shot scenarios, where samples are given to LLMs as part of prompts, leading
to better augmentations. Yet, the samples are mostly selected randomly and
a comprehensive overview of the effects of other (more 'informed') sample
selection strategies is lacking. In this work, we compare sample selection
strategies existing in few-shot learning literature and investigate their effects
in LLM-based textual augmentation. We evaluate this on in-distribution and
out-of-distribution classifier performance. Results indicate, that while some
'informed' selection strategies increase the performance of models, especially
for out-of-distribution data, it happens only seldom and with marginal
performance increases. Unless further advances are made, a default of random
sample selection remains a good option for augmentation practitioners.
data augmentation, analysis
@inproceedings{BUT193746,
author="Ján {Čegiň} and Branislav {Pecher} and Jakub {Šimko} and {} and Mária {Bieliková} and {}",
title="Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation",
year="2025",
pages="5533--5550",
publisher="Association for Computational Linguistics",
address="Suzhou, China",
doi="10.18653/v1/2025.findings-emnlp.296",
isbn="979-8-89176-335-7",
url="https://aclanthology.org/2025.findings-emnlp.296/"
}