Efficient Building Extraction for High Spatial Resolution Images Based on Dual Attention Network


  • Dandong Zhao
  • Haishi Zhao
  • Renchu Guan
  • Chen Yang


high spatial resolution images, building extraction, self-attention mechanism, dual attention network, deep learning


Building extraction with high spatial resolution images becomes an important research in the field of computer vision for urban-related applications. Due to the rich detailed information and complex texture features presented in high spatial resolution images, the distribution of buildings is non-proportional and their difference of scales is obvious. General methods often provide confusion results with other ground objects. In this paper, a building extraction framework based on deep residual neural network with a self-attention mechanism is proposed. This mechanism contains two parts: one is the spatial attention module, which is used to aggregate and relate the local and global features at each position (short and long distance context information) of buildings; the other is channel attention module, in which the representation of comprehensive features (includes color, texture, geometric and high-level semantic feature) are improved. The combination of the dual attention modules makes buildings can be extracted from the complex backgrounds. The effectiveness of our method is validated by the experiments counted on a wide range high spatial resolution image, i.e., Jilin-1 Gaofen 02A imagery. Compared with some state-of-the-art segmentation methods, i.e., DeepLab-v3+, PSPNet, and PSANet algorithms, the proposed dual attention network-based method achieved high accuracy and intersection-over-union for extraction performance and show finest recognition integrity of buildings.


[1] Zhao, L.; Zhou, X.; Kuang, G. (2013). Building detection from urban SAR image using building characteristics and contextual information, Journal on Advances in Signal Processing, 56(1), 1-16, 2013. https://doi.org/10.1186/1687-6180-2013-56

[2] Aytekin, O.; Ulusoy, I.; Erener, A.; Duzgun, H. (2009). Automatic and unsupervised building extraction in complex urban environments from multi spectral satellite imagery, In International Conference on Recent Advances in Space Technologies, IEEE, 287-291, 2009. https://doi.org/10.1109/RAST.2009.5158214

[3] Chen, D.; Shang, S.; Wu, C. (2014). Shadow-based building detection and segmentation in highresolution remote sensing image, Journal of Multimedia, IEEE, 287-291, 2009. https://doi.org/10.4304/jmm.9.1.181-188

[4] Mohammad, A.; Clive, F. (2014). Automatic segmentation of raw lidar data for extraction of building roofs, Remote Sensing, 6(5), 3716-3751, 2014. https://doi.org/10.3390/rs6053716

[5] Ok, A. O.; Senaras, C.; Yuksel, B. (2013). Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery, IEEE Transactions on Geoscience and Remote Sensing, 51(3), 1701-1717, 2013. https://doi.org/10.1109/TGRS.2012.2207123

[6] Meng, Y.; Peng, S. (2009). Object-oriented building extraction from high-resolution imagery based on fuzzy SVM, International Conference on Information Engineering and Computer Science, IEEE, 1-6, 2009. https://doi.org/10.1109/ICIECS.2009.5366011

[7] Huang, X.; Zhang, L.; Zhu, T. (2013). Building change detection from multitemporal highresolution remotely sensed images based on a morphological building index, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(1), 105-115, 2013. https://doi.org/10.1109/JSTARS.2013.2252423

[8] Hu, R.; Huang, X.; Huang, Y. (2014). An enhanced morphological building index for building extraction from high-resolution images, Acta Geodaetica et Cartographica Sinica, 3(5), 514-520, 2014.

[9] Huertas, A.; Nevatia, R. (1988). Detecting buildings in aerial images, Computer Vision Graphics and Image Processing, 41(2), 131-152, 1988. https://doi.org/10.1016/0734-189X(88)90016-3

[10] Inglada, J. (2007). Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features ISPRS Journal of Photogrammetry and Remote Sensing, 62(3), 236-248, 2007. https://doi.org/10.1016/j.isprsjprs.2007.05.011

[11] Krizhevsky, A.; Sutskever, I.; Hinton, G.E. (2017). ImageNet Classification with Deep Convolutional Neural Networks, Communications of the ACM, 1097-1105, 2017.

[12] Simonyan, K.; Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, Computer Science, 2014.

[13] Szegedy, C.; Wei, L.; Jia, Y.; Sermanet, P.; Rabinovich, A. (2014). Going deeper with convolutions, IEEE Computer Society,7, 1-9, 2014.

[14] He, K.; Zhang, X.; Ren, S. (2016). Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90

[15] Saito, S.; Yamashita, T.; Aoki, Y. (2016). Multiple object extraction from aerial imagery with convolutional neural networks, Journal of Imaging Science and Technology, 60(1), 104021-104029, 2016. https://doi.org/10.2352/J.ImagingSci.Technol.2016.60.1.010402

[16] Mnih, V. (2013). Machine Learning for Aerial Image Labeling, (Doctoral dissertation, University of Toronto (Canada).), 2013.

[17] Lv, X.; Ming, D.; Lu, T.; Zhou, K.; Wang, M.; Bao, H. (2018). A new method for region-based majority voting CNNs for very high resolution image classification, Remote Sensing, 10(12), 2072- 4292, 2018. https://doi.org/10.3390/rs10121946

[18] Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. (2018). DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848, 2018. https://doi.org/10.1109/TPAMI.2017.2699184

[19] Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. (2017). Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 17, 6230-6239, 2017. https://doi.org/10.1109/CVPR.2017.660

[20] Zhao, H.; Zhang, Y.; Liu, S., Shi, J.; Loy, C. C.; Lin, D. et al. (2018). Psanet: Point-wise spatial attention network for scene parsing, European Conference on Computer Vision, 11213, 270-286, 2018. https://doi.org/10.1007/978-3-030-01240-3_17

[21] Chen, L. C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation, Lecture Notes in Computer Science, 11211, 833-851, 2018. https://doi.org/10.1007/978-3-030-01234-2_49

[22] Ronneberger, O.; Fischer, P.; Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation, Lecture Notes in Computer Science, 9351, 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28

[23] Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. (2018). Learning a Discriminative Feature Network for Semantic Segmentation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018 https://doi.org/10.1109/CVPR.2018.00199

[24] Badrinarayanan, V.; Kendall, A.; Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence, 39, 2481-2495, 2017. https://doi.org/10.1109/TPAMI.2016.2644615

[25] Cheng, B.; Chen, L.C.; Wei, Y.; Zhu, Y.; Huang, Z.; Xiong, J.; Huang, T.S.; Hwu, W.M.; Shi, H. (2019). Spgnet: Semantic prediction guidance for scene parsing, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 5218-5228, 2019. https://doi.org/10.1109/ICCV.2019.00532

[26] Y. Tan.; S. Xiong.; Y. Li. (2018). Automatic extraction of built-up areas from panchromatic and multispectral remote sensing images using double stream deep convolutional neural networks, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11), 3988-4004, 2018. https://doi.org/10.1109/JSTARS.2018.2871046

[27] Sun, W.; Wang, R. (2018). Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM, IEEE Geoscience and Remote Sensing Letters,15(3), 474-478, 2018. https://doi.org/10.1109/LGRS.2018.2795531

[28] Zhang, R.; Li, G.; Li, M. et al. (2018). Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning, ISPRS Journal of Photogrammetry and Remote Sensing, 143, 85-96, 2018. https://doi.org/10.1016/j.isprsjprs.2018.04.022

[29] Kampffmeyer, M.; Salberg, A. B.; Jenssen, R. (2016). Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 1, 680-688, 2016. https://doi.org/10.1109/CVPRW.2016.90

[30] R. Davari Majd,; M. Momeni.; P. Moallem. (2019). Transferable object-based framework based on deep convolutional neural networks for building extraction, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(8), 2627-2635, 2019. https://doi.org/10.1109/JSTARS.2019.2924582

[31] Guo, H.; Shi, Q.; Du, B.; Zhang, L.; Ding, H. (2020). Scene-driven multitask parallel attention network for building extraction in high-resolution remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 1-20, 2020.

[32] Fu, J.; Liu, J.; Tian, H; Li, Y.;Bao, Y.; Fang, Z. et al. (2020). Dual Attention Network for Scene Segmentation, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3141-3149, 2020. https://doi.org/10.1109/CVPR.2019.00326

Additional Files



Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.