IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, cilt.22, 2025 (SCI-Expanded, Scopus)
Multispectral homogeneous bands capture distinct and complementary spectral characteristics; therefore, fusing multiple bands has the potential to increase semantic segmentation performance. However, the fusion of highly correlated homogeneous bands [i.e., RGB, near-infrared (NIR), and short-wave infrared (SWIR)] remains underexplored. We hypothesized that using correlation representations between highly correlated homogeneous spectral bands at higher level feature stages may improve segmentation accuracy. Therefore, we propose a novel semantic segmentation architecture that combines homogeneous modalities with a shared latent representation that exploits their intrinsic correlations. We also introduce interactive feature (IF) fusion blocks at early encoder stages to extract better cross-band correlations (CBCs). Our experiments on two different remote sensing image sets, both UAV-based and satellite-based, show that our correlation-driven fusion among homogeneous bands can enhance segmentation accuracy over state-of-the-art unimodal and multimodal models.