Publication Details
Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence
Horák Adam, Ing. (FIT)
Polišenský Jan, Ing. (DIFS)
Jeřábek Kamil, Ing., Ph.D. (DIFS)
Ryšavý Ondřej, doc. Ing., Ph.D. (DIFS)
Phishing, Domain, Detection, Machine learning, XGBoost, Features, DNS, RDAP, TLS, GeoIP
In the digital landscape, phishing attacks have rapidly evolved into a major cybersecurity challenge, posing significant risks to individuals and organizations. This short paper presents our preliminary research on detecting phishing domains. Our approach amalgamates intelligence from multiple sources: DNS servers, WHOIS/RDAP, TLS certificates, and GeoIP data. We created a rich 15.8 GB dataset of information about benign and phishing domains, from which we derived a comprehensive 80-feature vector for training and testing machine learning classifiers. We propose preliminary results with a fine-tuned XGBoost model, achieving 0.9716 precision rate, 0.9540 F-1 score, and false positive rate of 0.23%.
@inproceedings{BUT186776,
author="Radek {Hranický} and Adam {Horák} and Jan {Polišenský} and Kamil {Jeřábek} and Ondřej {Ryšavý}",
title="Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence",
booktitle="Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024",
year="2024",
pages="1--5",
publisher="Institute of Electrical and Electronics Engineers",
address="Soul",
doi="10.1109/NOMS59830.2024.10575573",
isbn="979-8-3503-2794-6",
url="https://ieeexplore.ieee.org/document/10575573"
}