Instance segmentation of different spatial resolution satellite images with Mask R-CNN integration


Lacinoglu B. S., GÜNGÖR O., Saralioglu E.

Acta Geodaetica et Geophysica, 2026 (SCI-Expanded, Scopus) identifier

  • Publication Type: Article / Article
  • Publication Date: 2026
  • Doi Number: 10.1007/s40328-026-00498-1
  • Journal Name: Acta Geodaetica et Geophysica
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Geobase
  • Keywords: Deep learning, Mask R-CNN, Remote sensing, Satellite images
  • Ankara University Affiliated: Yes

Abstract

Recent advancements in remote sensing technologies have enabled the acquisition of large volumes of high-resolution spatial data. However, the classification of complex datasets remains challenging using traditional methods. This study investigates the potential of Mask R-CNN and its variants for advancing the classification of satellite imagery with diverse resolutions and classes. The key advantages of the model, including transferability across regions, high accuracy, and rapid processing when trained on sufficiently large datasets, offer a novel solution beyond traditional methodologies. Extensive experiments were performed on novel datasets containing 1515 images and 18,811 annotated labels generated from Sentinel-2 and WorldView-3 satellite images. In instance segmentation tasks for urban area detection on Sentinel-2 imagery, Mask R-CNN F1 scores of 0.81 and 0.79 were achieved using ResNet-101 and ResNet-50 backbones, respectively. For the multi-class land cover instance segmentation (five classes) of Sentinel-2 data, the model attained an F1 score of 0.75. Finally, building segmentation on WorldView-3 imagery yielded a score of 0.84. The results demonstrate that the proposed methodology can effectively classify satellite images with diverse characteristics and resolutions. This study provides a methodological baseline for potential geographical adaptability, enabling the model to process images across diverse contexts, which is a critical step toward scalable remote sensing applications. The findings highlight the model’s transfer learning capacity and multi-scale data processing proficiency, positioning it as a versatile tool for tasks ranging from land cover mapping to urban monitoring.