Image Based Web Page Classification by Using Deep Learning


YAPICI M. M.

Gazi Mühendislik Bilimleri Dergisi, cilt.10, sa.1, ss.72-83, 2024 (Hakemli Dergi) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10 Sayı: 1
  • Basım Tarihi: 2024
  • Dergi Adı: Gazi Mühendislik Bilimleri Dergisi
  • Derginin Tarandığı İndeksler: TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.72-83
  • Ankara Üniversitesi Adresli: Evet

Özet

The internet holds a significant role in all aspects of our lives, and its importance continues to grow each day. Therefore, the usability of the Internet holds great significance. Low data quality and disinformation severely impact the usability of the internet. Consequently, people face challenges in obtaining accurate and clear information. In the present day, websites predominantly feature image-based content like pictures and videos, as opposed to text-based content. The classification of such content holds immense importance for search engines. As a result, the classification of web pages stands as a crucial research area for scholars. This study focuses on the classification of image-based web pages. A deep learning-based approach is proposed to categorize web pages into four main groups: tourism, machinery, music, and sports. The suggested method yielded the most favourable outcomes when utilizing the Stochastic Gradient Descent (SGD) optimization method, achieving an accuracy of 0.9737, a recall of 0.9474, an F1 score of 0.9474, and an Area Under the ROC Curve (AUC) value of 0.9649. Furthermore, the utilization of Deep Learning (DL) led to achieving the most advanced results in web page classification within the existing literature, particularly on the WebScreenshots dataset.