A comprehensive review of malicious URLs: Detection techniques, features and datasets


OSMANOĞLU M., Gupta D., ÖZKAN M., AR Y., Aslan Ö.

Computers and Electrical Engineering, cilt.136, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Derleme
  • Cilt numarası: 136
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.compeleceng.2026.111186
  • Dergi Adı: Computers and Electrical Engineering
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, zbMATH
  • Anahtar Kelimeler: Attack detection techniques, Discriminatory features, Malicious URL, Phishing, URL dataset
  • Ankara Üniversitesi Adresli: Evet

Özet

The rapid prevalence of internet technologies has led to radical changes in many areas from communication methods to business practices. However, this digital transformation has introduced various security flaws and an increase in cyber threats. Among these threats, malicious URLs, created to deceive users, exploit system vulnerabilities, steal sensitive information, or distribute malware, stand out as one of the most common and dangerous factors. As these threats grew more complex, researchers developed various methods to identify malicious URLs. Early solutions such as the blacklist-based approach failed to detect unseen or unknown threats. In recent years, studies in this field have focused on machine learning and deep learning-based methods that enable the development of dynamic, adaptive, and more generalizable detection mechanisms. The use of diverse datasets, feature extraction methods, and modeling strategies across these studies create a need for a comprehensive synthesis of the existing literature. In response to this need, several surveys analyzing malicious URL detection mechanisms have been introduced. However, these studies either focus only on the detection systems of phishing URLs or lack an integrated approach that systematically analyzes different aspects such as datasets, feature types, and modern AI-based methodologies. To address this issue, we conduct a comprehensive survey of recent studies on detecting malicious URLs with a holistic view. Within this context, we systematically analyze recently proposed methods, datasets, and feature engineering approaches. In addition, we identify open challenges and research gaps observed in the current literature, and offer potential directions for future studies, such as enhancing robustness against adversarial attacks and improving transparency through explainable AI, together with several research avenues for each direction. Thus, we present a systematic evaluation of recent developments in this field and serve as an extensive reference for researchers and practitioners.