Publication Details
Network Intrusion Datasets: A Survey, Limitations, and Recommendations
Network intrusion detection, NIDS, Data, Systematic Literature Review (SLR), Machine learning for intrusion detection, Cybersecurity, Best practices, Recommendations, Dataset popularity analysis, Domain limitations
Data-driven cyberthreat detection has become a crucial defense technique
in modern cybersecurity. Network defense, supported by Network
Intrusion Detection Systems (NIDSs), has also increasingly adopted
data-driven approaches, leading to greater reliance on data. Despite the
importance of data, its scarcity has long been recognized as a major
obstacle in NIDS research. In response, the community has published many
new datasets recently. However, many of them remain largely unknown and
unanalyzed, leaving researchers uncertain about their suitability for
specific use cases.
In this paper, we aim to address this knowledge gap by performing a
systematic literature review (SLR) of 89 public datasets for NIDS
research. Each dataset is comparatively analyzed across 13 key
properties, and its potential applications are outlined. Beyond the
review, we also discuss domain-specific challenges and common data
limitations to facilitate a critical view on data quality. To aid in
data selection, we conduct a dataset popularity analysis in contemporary
state-of-the-art NIDS research. Furthermore, the paper presents best
practices for dataset selection, generation, and usage. By providing a
comprehensive overview of the domain and its data, this work aims to
guide future research toward improving data quality and the robustness
of NIDS solutions.
@article{BUT194021,
author="Patrik {Goldschmidt} and Daniela {Chudá}",
title="Network Intrusion Datasets: A Survey, Limitations, and Recommendations",
journal="COMPUTERS & SECURITY",
year="2025",
volume="156",
pages="104510--104542",
doi="10.1016/j.cose.2025.104510",
issn="0167-4048",
url="https://www.sciencedirect.com/science/article/pii/S0167404825001993"
}