Acta Geodaetica et Geophysica, 2026 (SCI-Expanded, Scopus)
Recent advancements in remote sensing technologies have enabled the acquisition of large volumes of high-resolution spatial data. However, the classification of complex datasets remains challenging using traditional methods. This study investigates the potential of Mask R-CNN and its variants for advancing the classification of satellite imagery with diverse resolutions and classes. The key advantages of the model, including transferability across regions, high accuracy, and rapid processing when trained on sufficiently large datasets, offer a novel solution beyond traditional methodologies. Extensive experiments were performed on novel datasets containing 1515 images and 18,811 annotated labels generated from Sentinel-2 and WorldView-3 satellite images. In instance segmentation tasks for urban area detection on Sentinel-2 imagery, Mask R-CNN F1 scores of 0.81 and 0.79 were achieved using ResNet-101 and ResNet-50 backbones, respectively. For the multi-class land cover instance segmentation (five classes) of Sentinel-2 data, the model attained an F1 score of 0.75. Finally, building segmentation on WorldView-3 imagery yielded a score of 0.84. The results demonstrate that the proposed methodology can effectively classify satellite images with diverse characteristics and resolutions. This study provides a methodological baseline for potential geographical adaptability, enabling the model to process images across diverse contexts, which is a critical step toward scalable remote sensing applications. The findings highlight the model’s transfer learning capacity and multi-scale data processing proficiency, positioning it as a versatile tool for tasks ranging from land cover mapping to urban monitoring.