Malicious URL Detection with Advanced Machine Learning and Optimization-Supported Deep Learning Models

Türk, Fuat; Kılıçaslan, MAHMUT

doi:10.3390/app151810090

Malicious URL Detection with Advanced Machine Learning and Optimization-Supported Deep Learning Models

Türk F., Kılıçaslan M.

Applied Sciences (Switzerland), cilt.15, sa.18, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 18
Basım Tarihi: 2025
Doi Numarası: 10.3390/app151810090
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: ELECTRA model, feature selection, malware detection, optimization algorithms
Ankara Üniversitesi Adresli: Evet

Özet

This study presents a comprehensive comparative analysis of machine learning, deep learning, and optimization-based hybrid methods for malicious URL detection on the Malicious Phish dataset. For feature selection and model hyperparameter tuning, the Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Harris Hawk Optimizer (HHO) were employed. Both multiclass and binary classification tasks were addressed using classic machine learning algorithms such as LightGBM, XGBoost, and Random Forest, as well as deep learning models including LSTM, CNN, and hybrid CNN+LSTM architectures, with optimization support also integrated into these models. The experimental results reveal that the ELECTRA-based deep learning model achieved outstanding accuracy and F1-scores of up to 99% in both multiclass and binary scenarios. Although optimization-supported hybrid models also improved performance, the language-model-based ELECTRA architecture demonstrated a significant superiority over classical and optimized approaches. The findings indicate that optimization algorithms are effective in feature selection and enhancing model performance, yet next-generation language models clearly set a new benchmark in malicious URL detection.