A Lightweight Attentional Shift Graph Convolutional Network for Skeleton-Based Action Recognition
Keywords:action recognition, lightweight network, shift graph convolution, attention module
AbstractIn the field of skeleton-based human behavior recognition, graph convolutional neural networks have made remarkable achievements. However, high precision networks are often accompanied by numerous parameters and computational cost, and their application in mobile devices has considerable limitations. Aiming at the problem of excessive spatiotemporal complexity of high-accuracy methods, this paper further analyzes the lightweight human action recognition model and proposes a lightweight architecture attentional shift graph convolutional network. There are three main improvements in this model. Firstly, shift convolution is a lightweight convolution method that can be combined with graph convolution to effectively reduce its complexity. At the same time, a shallow architecture for multi-stream early fusion is designed to reduce the network scale by merging multi-stream networks and reducing the number of network layers. In addition, the efficient channel attention module is introduced into the model to capture the underlying characteristic information in the channel domain. Experiments are conducted on the three existing skeleton datasets, NTU RGB+D, NTU-120 RGB+D, and Northwestern-UCLA. Results demonstrate that the proposed model is not only competitive in accuracy, but also outperforms current mainstream methods in parameter count and computational cost, and supports running in some devices with limited computing and storage resources.
R. Y. Lee, T. Y. Chai, S. Y. Chua, Y. L. Lai, Y. W. Sim, and S. C. Haw, "Cashierless checkout vision system for smart retail using deep learning," Journal of System and Management Sciences, vol. 12, no. 4, pp. 232-250, 2022.
D. Lai, S. L. Lew, and S. Y. Ooi, "Mobile interactive system in virtual classroom based on tpack: A study from students' perspectives," Journal of Logistics, Informatics and Service Science, vol. 9, no. 3, pp. 159-171, 2022.
Z. J. Khow, M. K. O. Goh, C. Tee, and C. Y. Law, "A yovo5 based real-time helmet and mask detection system," Journal of Logistics, Informatics and Service Science, vol. 9, no. 3, pp. 97-111, 2022.
Du, Yong and Wang, Wei and Wang, Liang (2015). Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 1110-1118, 2015.
Liu, Jun and Shahroudy, Amir and Xu, Dong and Wang, Gang (2016). Spatio-temporal lstm with trust gates for 3d human action recognition, European conference on computer vision, 816-833, 2016.
Liu, Jun and Wang, Gang and Hu, Ping and Duan, Ling-Yu and Kot, Alex C (2017). Global context-aware attention lstm networks for 3d action recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 1647-1656, 2017.
W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, and X. Xie, "Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks," in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
C. Si, Y. Jing, W. Wang, L. Wang, and T. Tan, "Skeleton-based action recognition with spatial reasoning and temporal stack learning," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 103-118.
Lee, Inwoong and Kim, Doyoung and Kang, Seoungyoon and Lee, Sanghoon (2017). Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, Proceedings of the IEEE international conference on computer vision, 1012-1020, 2017.
Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," in 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, 2015, pp. 579-583.
P. Wang, Z. Li, Y. Hou, and W. Li, "Action recognition based on joint trajectory maps using convolutional neural networks," in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 102-106.
Soo Kim, Tae and Reiter, Austin (2017). Interpretable 3d human action analysis with temporal convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 20-28, 2017.
Liu, Mengyuan and Liu, Hong and Chen, Chen (2017). Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, 68, 346-362, 2017.
Li, Chao and Zhong, Qiaoyong and Xie, Di and Pu, Shiliang (2017). Skeleton-based action recognition with convolutional neural networks, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 597-600, 2017.
Ke, Qiuhong and Bennamoun, Mohammed and An, Senjian and Sohel, Ferdous and Boussaid, Farid (2018). Learning clip representations for skeleton-based 3d action recognition, IEEE Transactions on Image Processing, 27(6), 2842-2855, 2018.
Yan, Sijie and Xiong, Yuanjun and Lin, Dahua (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition, Thirty-second AAAI conference on artificial intelligence.
Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi (2019). Actional-structural graph convolutional networks for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3595-3603, 2019.
Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12026-12035, 2019.
Si, Chenyang and Chen, Wentao and Wang, Wei and Wang, Liang and Tan, Tieniu (2019). An attention enhanced graph convolutional lstm network for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1227-1236, 2019.
Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing (2019). Skeleton-based action recognition with directed graph neural networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7912-7921, 2019.
Zhang, Xikun and Xu, Chang and Tao, Dacheng (2020). Context aware graph convolution for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14333-14342, 2020.
Peng, Wei and Hong, Xiaopeng and Chen, Haoyu and Zhao, Guoying (2020). Learning graph convolutional network for skeleton-based human action recognition by neural searching, Proceedings of the AAAI Conference on Artificial Intelligence, 34(3), 2669-2676, 2020.
Plizzari, Chiara and Cannici, Marco and Matteucci, Matteo (2021). Skeleton-based action recognition via spatial and temporal transformer networks, Computer Vision and Image Understanding, 208, 103219, 2021.
Huang, Linjiang and Huang, Yan and Ouyang, Wanli and Wang, Liang (2020). Part-level graph convolutional network for skeleton-based action recognition, Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 11045-11052, 2020.
Song, Yi-Fan and Zhang, Zhang and Shan, Caifeng and Wang, Liang (2020). Richly activated graph convolutional network for robust skeleton-based action recognition, IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925, 2020.
Y.-H.Wen, L. Gao, H. Fu, F.-L. Zhang, and S. Xia, "Graph cnns with motif and variable temporal block for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8989-8996.
L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Skeleton-based action recognition with multi-stream adaptive graph convolutional networks," IEEE Transactions on Image Processing, vol. 29, pp. 9532-9545, 2020.
Zhang, Pengfei and Lan, Cuiling and Zeng, Wenjun and Xing, Junliang and Xue, Jianru and Zheng, Nanning (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition, proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1112-1121, 2020.
Cheng, Ke and Zhang, Yifan and He, Xiangyu and Chen, Weihan and Cheng, Jian and Lu, Hanqing (2020). Skeleton-based action recognition with shift graph convolutional network, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 183-192, 2020.
Song, Yi-Fan and Zhang, Zhang and Shan, Caifeng and Wang, Liang (2020). Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition, proceedings of the 28th ACM international conference on multimedia, 1625-1633, 2020.
Sun, Ning and Leng, Ling and Liu, Jixin and Han, Guang (2021). Multi-stream slowFast graph convolutional networks for skeleton-based action recognition, Image and Vision Computing, 109, 104141, 2021.
Copyright (c) 2023 Xianshan Li, Jingwen Kang, Yang Yang, Fengda Zhao
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.