Publication Details

Pattern Matching in YARA: Improved Aho-Corasick Algorithm

REGÉCIOVÁ Dominika, KOLÁŘ Dušan and MILKOVIČ Marek. Pattern Matching in YARA: Improved Aho-Corasick Algorithm. IEEE Access, vol. 9, no. 1, 2021, pp. 62857-62866. ISSN 2169-3536. Available from: https://ieeexplore.ieee.org/document/9410267
Type
journal article
Language
english
Authors
URL
Keywords

Aho-Corasick algorithm, pattern matching, regular expressions, YARA

Abstract

YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings, and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process.
In this paper, we analyze the true nature of regular expressions in YARA and its implementation.
We discovered several reasons regular expressions can, in a fact, slow down scanning based on the nature of the used algorithm, Aho-Corasick. We proposed a new version of this algorithm and we implemented it in the original version of this tool.
The experiments are presented, proving the speed of pattern matching with regular expressions can be indeed improved.

Published
2021
Pages
62857-62866
Journal
IEEE Access, vol. 9, no. 1, ISSN 2169-3536
Publisher
Institute of Electrical and Electronics Engineers
DOI
UT WoS
000645857100001
EID Scopus
BibTeX
@ARTICLE{FITPUB12412,
   author = "Dominika Reg\'{e}ciov\'{a} and Du\v{s}an Kol\'{a}\v{r} and Marek Milkovi\v{c}",
   title = "Pattern Matching in YARA: Improved Aho-Corasick Algorithm",
   pages = "62857--62866",
   journal = "IEEE Access",
   volume = 9,
   number = 1,
   year = 2021,
   ISSN = "2169-3536",
   doi = "10.1109/ACCESS.2021.3074801",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/12412"
}
Back to top