Development of BiLSTM deep learning model to detect URL-based phishing attacks


Akçam Ö. Ş., TEKEREK A., TEKEREK M.

Computers and Electrical Engineering, cilt.123, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 123
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.compeleceng.2025.110212
  • Dergi Adı: Computers and Electrical Engineering
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: BiLSTM, character-based extraction, deep learning, phishing detection, skip-gram, word-based extraction
  • Ankara Üniversitesi Adresli: Evet

Özet

Phishing attacks steal critical information by exploiting security vulnerabilities in information systems. This study aims to detect URL-based phishing attacks. In this study, a deep learning model based on character and word-based feature extraction is developed. With the developed model, URLs are classified as legitimate or phishing. Bidirectional Long Short-Term Memory (BiLSTM) algorithm and GramBeddings, Malicious and Benign URLs, and Ebbu2017 Phishing datasets were used to develop the model. Also, Mendeley Data Web Page Phishing Detection datasets were used to test the developed model. The developed model achieved test results of 98.24% accuracy and 0.9977 area under curve (AUC) for the GramBeddings dataset, 99.32% accuracy and 0.9986 AUC for the Malicious and Benign URLs dataset, 98.34% accuracy and 0.9981 AUC for the Ebbu2017 dataset, and 90.33% accuracy and 0.9694 AUC for the Mendeley Data Web Page Phishing Detection dataset. These results prove the effectiveness of the model in detecting phishing attacks. The model's uniqueness is that it analyses the structural patterns of URLs through character-based inference and evaluates the contextual meaning through word-based inference. This enables effective detection of phishing URLs at both character and word levels.