Comparative Performance of YOLOv5, YOLOv8, and YOLOv11 for Person Detection in Thermal Imagery


Yavuz R., FIÇICI C., ÇATALBAŞ M. C.

5th International Conference on Informatics and Software Engineering, IISEC 2026, Ankara, Türkiye, 5 - 06 Şubat 2026, ss.451-455, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iisec69317.2026.11418454
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.451-455
  • Anahtar Kelimeler: Deep Learning, Human Detection, Infrared Imaging, Low-Visibility Environments, Object Detection, Search and Rescue, Thermal Imaging, UAV Applications, YOLO
  • Ankara Üniversitesi Adresli: Evet

Özet

Human detection in thermal images has become an important topic that stands out in many critical areas such as search and rescue operations, security, and autonomous systems. The deep learning-based object detection model You Look Only Once is preferred due to its real-time detection capability, high accuracy and precision, and resilience against challenges encountered in thermal images such as adverse weather conditions, night vision complexity, and insufficient light. In this study, the human detection performance of the YOLOv5s, YOLOv8s, YOLOv8l, and YOLOv11s models was comparatively evaluated using a suitable thermal image dataset. The models were compared using precision, recall, mAP at 0.50 IoU, and processing time (time) metrics. YOLO11s and YOLOv8s achieved the highest precision values (88-89%), while YOLOv5s lagged behind with a lower precision rate of 87% and YOLOv8l with 87%. In terms of mAP at 0.50 IoU values, YOLOv8s and YOLO11s achieved high success at 89% levels, while YOLOv5s and YOLOv8l showed a lower performance at around 87%. In terms Relative Computation Complexity (RCC), YOLOv8l, selected as the baseline model, has the lowest computational load, while YOLOv5s and YOLOv8s show similar levels of complexity, and YOLOv11s has been determined to be the model with the highest computational complexity. In addition, hardware and computational costs were also considered using metrics such as the number of parameters, inference speed in frames per second (FPS), latency, and memory usage in video random-access memory (VRAM). The results obtained show that YOLOv11s and YOLOv8s are particularly suitable for systems requiring real-time performance.