Result Details

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

ČEGIŇ, J.; PECHER, B.; ŠIMKO, J.; SRBA, I.; BIELIKOVÁ, M.; BRUSILOVSKY, P. Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation. Suzhou, China: Association for Computational Linguistics, 2025. p. 5533-5550. ISBN: 979-8-89176-335-7.
Type
conference paper
Language
English
Authors
Čegiň Ján, Ing., DCGM (FIT)
Pecher Branislav, Ing., Ph.D.
Šimko Jakub, doc. Ing., PhD., DCGM (FIT)
Srba Ivan
Bieliková Mária, prof. Ing., Ph.D., DCGM (FIT)
Brusilovsky Peter
Abstract

The generative large language models (LLMs) are increasingly used for data
augmentation tasks, where text samples are paraphrased (or generated anew) and
then used for classifier fine-tuning. Existing works on augmentation leverage the
few-shot scenarios, where samples are given to LLMs as part of prompts, leading
to better augmentations. Yet, the samples are mostly selected randomly and
a comprehensive overview of the effects of other (more 'informed') sample
selection strategies is lacking. In this work, we compare sample selection
strategies existing in few-shot learning literature and investigate their effects
in LLM-based textual augmentation. We evaluate this on in-distribution and
out-of-distribution classifier performance. Results indicate, that while some
'informed' selection strategies increase the performance of models, especially
for out-of-distribution data, it happens only seldom and with marginal
performance increases. Unless further advances are made, a default of random
sample selection remains a good option for augmentation practitioners.

Keywords

data augmentation, analysis

URL
Published
2025
Pages
5533–5550
Conference
Conference on Empirical Methods in Natural Language Processing
ISBN
979-8-89176-335-7
Publisher
Association for Computational Linguistics
Place
Suzhou, China
DOI
BibTeX
@inproceedings{BUT193746,
  author="Ján {Čegiň} and Branislav {Pecher} and Jakub {Šimko} and  {} and Mária {Bieliková} and  {}",
  title="Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation",
  year="2025",
  pages="5533--5550",
  publisher="Association for Computational Linguistics",
  address="Suzhou, China",
  doi="10.18653/v1/2025.findings-emnlp.296",
  isbn="979-8-89176-335-7",
  url="https://aclanthology.org/2025.findings-emnlp.296/"
}
Departments
Back to top