Efficient Building Extraction for High Spatial Resolution Images Based on Dual Attention Network
Keywords:high spatial resolution images, building extraction, self-attention mechanism, dual attention network, deep learning
Building extraction with high spatial resolution images becomes an important research in the field of computer vision for urban-related applications. Due to the rich detailed information and complex texture features presented in high spatial resolution images, the distribution of buildings is non-proportional and their difference of scales is obvious. General methods often provide confusion results with other ground objects. In this paper, a building extraction framework based on deep residual neural network with a self-attention mechanism is proposed. This mechanism contains two parts: one is the spatial attention module, which is used to aggregate and relate the local and global features at each position (short and long distance context information) of buildings; the other is channel attention module, in which the representation of comprehensive features (includes color, texture, geometric and high-level semantic feature) are improved. The combination of the dual attention modules makes buildings can be extracted from the complex backgrounds. The effectiveness of our method is validated by the experiments counted on a wide range high spatial resolution image, i.e., Jilin-1 Gaofen 02A imagery. Compared with some state-of-the-art segmentation methods, i.e., DeepLab-v3+, PSPNet, and PSANet algorithms, the proposed dual attention network-based method achieved high accuracy and intersection-over-union for extraction performance and show finest recognition integrity of buildings.
 Aytekin, O.; Ulusoy, I.; Erener, A.; Duzgun, H. (2009). Automatic and unsupervised building extraction in complex urban environments from multi spectral satellite imagery, In International Conference on Recent Advances in Space Technologies, IEEE, 287-291, 2009. https://doi.org/10.1109/RAST.2009.5158214
 Chen, D.; Shang, S.; Wu, C. (2014). Shadow-based building detection and segmentation in highresolution remote sensing image, Journal of Multimedia, IEEE, 287-291, 2009. https://doi.org/10.4304/jmm.9.1.181-188
 Mohammad, A.; Clive, F. (2014). Automatic segmentation of raw lidar data for extraction of building roofs, Remote Sensing, 6(5), 3716-3751, 2014. https://doi.org/10.3390/rs6053716
 Ok, A. O.; Senaras, C.; Yuksel, B. (2013). Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery, IEEE Transactions on Geoscience and Remote Sensing, 51(3), 1701-1717, 2013. https://doi.org/10.1109/TGRS.2012.2207123
 Meng, Y.; Peng, S. (2009). Object-oriented building extraction from high-resolution imagery based on fuzzy SVM, International Conference on Information Engineering and Computer Science, IEEE, 1-6, 2009. https://doi.org/10.1109/ICIECS.2009.5366011
 Huang, X.; Zhang, L.; Zhu, T. (2013). Building change detection from multitemporal highresolution remotely sensed images based on a morphological building index, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(1), 105-115, 2013. https://doi.org/10.1109/JSTARS.2013.2252423
 Hu, R.; Huang, X.; Huang, Y. (2014). An enhanced morphological building index for building extraction from high-resolution images, Acta Geodaetica et Cartographica Sinica, 3(5), 514-520, 2014.
 Huertas, A.; Nevatia, R. (1988). Detecting buildings in aerial images, Computer Vision Graphics and Image Processing, 41(2), 131-152, 1988. https://doi.org/10.1016/0734-189X(88)90016-3
 Inglada, J. (2007). Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features ISPRS Journal of Photogrammetry and Remote Sensing, 62(3), 236-248, 2007. https://doi.org/10.1016/j.isprsjprs.2007.05.011
 Krizhevsky, A.; Sutskever, I.; Hinton, G.E. (2017). ImageNet Classification with Deep Convolutional Neural Networks, Communications of the ACM, 1097-1105, 2017.
 Simonyan, K.; Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, Computer Science, 2014.
 Szegedy, C.; Wei, L.; Jia, Y.; Sermanet, P.; Rabinovich, A. (2014). Going deeper with convolutions, IEEE Computer Society,7, 1-9, 2014.
 He, K.; Zhang, X.; Ren, S. (2016). Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, 2016. https://doi.org/10.1109/CVPR.2016.90
 Saito, S.; Yamashita, T.; Aoki, Y. (2016). Multiple object extraction from aerial imagery with convolutional neural networks, Journal of Imaging Science and Technology, 60(1), 104021-104029, 2016. https://doi.org/10.2352/J.ImagingSci.Technol.2016.60.1.010402
 Mnih, V. (2013). Machine Learning for Aerial Image Labeling, (Doctoral dissertation, University of Toronto (Canada).), 2013.
 Lv, X.; Ming, D.; Lu, T.; Zhou, K.; Wang, M.; Bao, H. (2018). A new method for region-based majority voting CNNs for very high resolution image classification, Remote Sensing, 10(12), 2072- 4292, 2018. https://doi.org/10.3390/rs10121946
 Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. (2018). DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834-848, 2018. https://doi.org/10.1109/TPAMI.2017.2699184
 Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. (2017). Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 17, 6230-6239, 2017. https://doi.org/10.1109/CVPR.2017.660
 Zhao, H.; Zhang, Y.; Liu, S., Shi, J.; Loy, C. C.; Lin, D. et al. (2018). Psanet: Point-wise spatial attention network for scene parsing, European Conference on Computer Vision, 11213, 270-286, 2018. https://doi.org/10.1007/978-3-030-01240-3_17
 Chen, L. C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation, Lecture Notes in Computer Science, 11211, 833-851, 2018. https://doi.org/10.1007/978-3-030-01234-2_49
 Ronneberger, O.; Fischer, P.; Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation, Lecture Notes in Computer Science, 9351, 234-241, 2015. https://doi.org/10.1007/978-3-319-24574-4_28
 Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. (2018). Learning a Discriminative Feature Network for Semantic Segmentation, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018 https://doi.org/10.1109/CVPR.2018.00199
 Badrinarayanan, V.; Kendall, A.; Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence, 39, 2481-2495, 2017. https://doi.org/10.1109/TPAMI.2016.2644615
 Cheng, B.; Chen, L.C.; Wei, Y.; Zhu, Y.; Huang, Z.; Xiong, J.; Huang, T.S.; Hwu, W.M.; Shi, H. (2019). Spgnet: Semantic prediction guidance for scene parsing, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 5218-5228, 2019. https://doi.org/10.1109/ICCV.2019.00532
 Y. Tan.; S. Xiong.; Y. Li. (2018). Automatic extraction of built-up areas from panchromatic and multispectral remote sensing images using double stream deep convolutional neural networks, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(11), 3988-4004, 2018. https://doi.org/10.1109/JSTARS.2018.2871046
 Sun, W.; Wang, R. (2018). Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM, IEEE Geoscience and Remote Sensing Letters,15(3), 474-478, 2018. https://doi.org/10.1109/LGRS.2018.2795531
 Zhang, R.; Li, G.; Li, M. et al. (2018). Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning, ISPRS Journal of Photogrammetry and Remote Sensing, 143, 85-96, 2018. https://doi.org/10.1016/j.isprsjprs.2018.04.022
 Kampffmeyer, M.; Salberg, A. B.; Jenssen, R. (2016). Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 1, 680-688, 2016. https://doi.org/10.1109/CVPRW.2016.90
 R. Davari Majd,; M. Momeni.; P. Moallem. (2019). Transferable object-based framework based on deep convolutional neural networks for building extraction, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(8), 2627-2635, 2019. https://doi.org/10.1109/JSTARS.2019.2924582
 Guo, H.; Shi, Q.; Du, B.; Zhang, L.; Ding, H. (2020). Scene-driven multitask parallel attention network for building extraction in high-resolution remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, 1-20, 2020.
 Fu, J.; Liu, J.; Tian, H; Li, Y.;Bao, Y.; Fang, Z. et al. (2020). Dual Attention Network for Scene Segmentation, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3141-3149, 2020. https://doi.org/10.1109/CVPR.2019.00326
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.