Detail výsledku

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

ČEGIŇ, J.; PECHER, B.; ŠIMKO, J.; SRBA, I.; BIELIKOVÁ, M.; BRUSILOVSKY, P. Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation. Suzhou, China: Association for Computational Linguistics, 2025. p. 5533-5550. ISBN: 979-8-89176-335-7.
Typ
článek ve sborníku konference
Jazyk
anglicky
Autoři
Čegiň Ján, Ing., UPGM (FIT)
Pecher Branislav, Ing., Ph.D.
Šimko Jakub, doc. Ing., PhD., UPGM (FIT)
Srba Ivan
Bieliková Mária, prof. Ing., Ph.D., UPGM (FIT)
Brusilovsky Peter
Abstrakt

The generative large language models (LLMs) are increasingly used for data
augmentation tasks, where text samples are paraphrased (or generated anew) and
then used for classifier fine-tuning. Existing works on augmentation leverage the
few-shot scenarios, where samples are given to LLMs as part of prompts, leading
to better augmentations. Yet, the samples are mostly selected randomly and
a comprehensive overview of the effects of other (more 'informed') sample
selection strategies is lacking. In this work, we compare sample selection
strategies existing in few-shot learning literature and investigate their effects
in LLM-based textual augmentation. We evaluate this on in-distribution and
out-of-distribution classifier performance. Results indicate, that while some
'informed' selection strategies increase the performance of models, especially
for out-of-distribution data, it happens only seldom and with marginal
performance increases. Unless further advances are made, a default of random
sample selection remains a good option for augmentation practitioners.

Klíčová slova

data augmentation, analysis

URL
Rok
2025
Strany
5533–5550
Konference
Conference on Empirical Methods in Natural Language Processing
ISBN
979-8-89176-335-7
Vydavatel
Association for Computational Linguistics
Místo
Suzhou, China
DOI
BibTeX
@inproceedings{BUT193746,
  author="Ján {Čegiň} and Branislav {Pecher} and Jakub {Šimko} and  {} and Mária {Bieliková} and  {}",
  title="Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation",
  year="2025",
  pages="5533--5550",
  publisher="Association for Computational Linguistics",
  address="Suzhou, China",
  doi="10.18653/v1/2025.findings-emnlp.296",
  isbn="979-8-89176-335-7",
  url="https://aclanthology.org/2025.findings-emnlp.296/"
}
Pracoviště
Nahoru