融合局部-全局特征的双分支遥感影像建筑物提取网络Dual-Branch Network for Building Extraction in Remote Sensing Images with Fusion of Local-Global Features
刘二虎,李浩文,胡煜,徐胜军,李小晗,史亚
摘要(Abstract):
从遥感影像中高效且自动地提取建筑物信息是遥感智能化解译的一项重要工作,然而高分辨率遥感影像中的建筑物大小不一、形状多变,背景干扰严重,导致现有算法的提取效果不佳。针对此问题,提出了一种融合局部-全局特征的双分支网络,用于遥感影像中建筑物的准确高效提取。设计了一种CNN与Transformer双分支结构的编码器以同时捕获建筑物的局部纹理信息和全局上下文依赖关系;为了克服CNN分支与Transformer分支所提取特征的差异性,设计了跨特征注意力融合模块(CFAFM)来有效地聚合两个分支所提取到的两组不同特征,对其进行重要性加权;为了增强解码器的细粒度特征恢复能力,设计了特征细化增强模块(FREM),插入至解码器的末端以减少上采样过程中的信息丢失,细化建筑物的边缘和局部细节。在WHU、Massachusetts及Inria建筑物数据集中,所提网络的IoU分别达到90.84%、74.94%、81.24%,F1-score分别达到95.20%、85.53%、89.69%。实验结果表明,所提网络可以有效提高遥感影像建筑物的提取精度,且在复杂任务场景下与现有方法相比具有明显的优势。
关键词(KeyWords): 遥感影像;建筑物提取;双分支网络;特征融合;特征细化增强
基金项目(Foundation): 国家自然科学基金(52278125,62276207);; 陕西省社会发展攻关项目(2021SF-429);; 陕西省自然科学基础研究计划资助项目(2023-JC-YB-532),陕西省自然科学基础研究计划一般项目-面上项目(2024JC-YBMS-483);; 西安建筑科技大学科研启动项目(1960324027,1960324009)~~
作者(Author): 刘二虎,李浩文,胡煜,徐胜军,李小晗,史亚
参考文献(References):
- [1] HUANG H G, LIU J B, WANG R S. Easy-Net:a lightweight building extraction network based on building features[J].IEEE Transactions on Geoscience and Remote Sensing,2023, 62:4501515.
- [2]侯佳兴,齐向明,郝明,等.融合Partial卷积与残差细化的遥感影像建筑物提取算法[J].计算机科学与探索, 2024,18(10):2712-2726.HOU J X, QI X M, HAO M, et al. Building extraction algorithm for remote sensing images by fusing partial convolution and residual refinement[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(10):2712-2726.
- [3] TURKER M, KOC-SAN D. Building extraction from highresolution optical spaceborne images using the integration of support vector machine(SVM)classification, Hough transformation and perceptual grouping[J]. International Journal of Applied Earth Observation and Geoinformation,2015, 34:58-69.
- [4]吴秀芸,李艳.基于改进标记分水岭的遥感影像建筑物提取[J].水电能源科学, 2010, 28(4):72-74.WU X Y, LI Y. Building extraction from remote sensing image based on improved marker-controlled watershed algorithm[J]. Water Resources and Power, 2010, 28(4):72-74.
- [5]李晓冬,凌峰,杜耘.基于各向异性Markov随机场的遥感影像亚像元尺度建筑物提取[J].中国图象图形学报, 2012,17(8):1042-1048.LI X D, LING F, DU Y. Building extraction at the sub-pixel scale from remotely sensed images based on anisotropic Markov random field[J]. Journal of Image and Graphics,2012, 17(8):1042-1048.
- [6] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
- [7] RONNEBERGER O, FISCHER P, BROX T. U-Net:convolutional networks for biomedical image segmentation[C]//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer, 2015:234-241.
- [8] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6230-6239.
- [9] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2018, 40(4):834-848.
- [10] YI Y N, ZHANG Z J, ZHANG W C, et al. Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network[J]. Remote Sensing, 2019, 11(15):1774.
- [11] ZHU Q, LIAO C, HU H, et al. MAP-net:multiple attending path neural network for building footprint extraction from remote sensed imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(7):6169-6181.
- [12] ZUO X L, SHAO Z F, WANG J M, et al. A cross-stage features fusion network for building extraction from remote sensing images[J]. Geo-spatial Information Science, 2025,28(2):387-401.
- [13] DOSSVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words[EB/OL].[2024-09-13]. http://arxiv.org/abs/2010.11929.pdf.
- [14] CHEN K Y, ZOU Z X, SHI Z W. Building extraction from remote sensing images with sparse token transformers[J].Remote Sensing, 2021, 13(21):4441.
- [15] HE X, ZHOU Y, ZHAO J Q, et al. Swin transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing,2022, 60:4408715.
- [16] ZHANG R H, WAN Z C, ZHANG Q, et al. DSAT-Net:dual spatial attention transformer for building extraction from aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20:6008405.
- [17] XU L L, LI Y, XU J Z, et al. BCTNet:bi-branch cross-fusion transformer for building footprint extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61:4402014.
- [18] XIE E, WANG W, YU Z, et al. SegFormer:simple and efficient design for semantic segmentation with transformers[C]//Advances in Neural Information Processing Systems34, 2021:12077-12090.
- [19] HOU R, CHANG H, MA B, et al. Cross attention network for few-shot classification[C]//Advances in Neural Information Processing Systems 32, 2019:4005-4016.
- [20] NI Y, LIU J H, CHI W J, et al. CGGLNet:semantic segmentation network for remote sensing images based on categoryguided global-local feature interaction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62:5615617.
- [21] WANG W, DAI J, CHEN Z, et al. InternImage:exploring large-scale vision foundation models with deformable convolutions[C]//Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2023:14408-14419.
- [22] WANG L B, FANG S H, MENG X L, et al. Building extraction with vision transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5625711.
- [23] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
- [24] SUN K, ZHAO Y, JIANG B, et al. High-resolution representations for labeling pixels and regions[EB/OL].[2024-09-13]. https://arxiv.org/abs/1904.04514.pdf.
- [25] LIU Y, ZHAO Z Y, ZHANG S W, et al. Multiregion scaleaware network for building extraction from high-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5626310.
- [26] WANG L B, LI R, ZHANG C, et al. UNetFormer:a UNetlike transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190:196-214.
- [27] QIU Y, WU F, YIN J C, et al. MSL-Net:an efficient network for building extraction from aerial imagery[J]. Remote Sensing, 2022, 14(16):3914.
- [28] ZHOU Y, CHEN Z L, WANG B, et al. BOMSC-Net:boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5618617.
- [29] LIN H J, HAO M, LUO W Q, et al. BEARNet:a novel buildings edge-aware refined network for building extraction from high-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20:6005305.