15th International Conference on Advanced Computer Information Technologies, ACIT 2025, Hybrid, Sibenik, Hırvatistan, 17 - 19 Eylül 2025, ss.859-862, (Tam Metin Bildiri)
In this study introduces an explainable anomaly detection framework grounded in the YOLOv11 architecture, targeting object-level irregularities such as bikers, carts, and skaters in crowded scenes. The primary aim is to enhance the interpretability of deep learning models employed in visual surveillance by integrating gradient- and attention-based explainability techniques. To this end, we used TransCAM [16], a novel fusion strategy that combines Gradient-weighted Class Activation Mapping (GradCAM) [10] with Transformer-derived attention maps. This fusion facilitates more precise and semantically coherent visual explanations by highlighting the spatial regions that influence the model's predictions in dense visual contexts. The proposed model is trained on the grayscale UCSD Ped2 dataset, with data augmentation strategies - specifically brightness variation and horizontal flipping - employed to increase generalizability. Experimental evaluation demonstrates the effectiveness of the method in multi-class anomaly detection, achieving a mean Average Precision (mAP@50) of 98.5%, with a precision of 94.7% and recall of 97.1%.