A Lightweight Attentional Shift Graph Convolutional Network for Skeleton-Based Action Recognition

Xianshan Li; Jingwen Kang; Yang Yang; Fengda Zhao

doi:10.15837/ijccc.2023.3.5061

Authors

Xianshan Li School of Information Science and Engineering, Yanshan University Qinhuangdao 066004, China
Jingwen Kang School of Information Science and Engineering, Yanshan University Qinhuangdao 066004, China
Yang Yang School of Information Science and Engineering, Yanshan University Qinhuangdao 066004, China
Fengda Zhao School of Information Science and Engineering, Yanshan University Qinhuangdao 066004, China

DOI:

https://doi.org/10.15837/ijccc.2023.3.5061

Keywords:

action recognition, lightweight network, shift graph convolution, attention module

Abstract

In the field of skeleton-based human behavior recognition, graph convolutional neural networks have made remarkable achievements. However, high precision networks are often accompanied by numerous parameters and computational cost, and their application in mobile devices has considerable limitations. Aiming at the problem of excessive spatiotemporal complexity of high-accuracy methods, this paper further analyzes the lightweight human action recognition model and proposes a lightweight architecture attentional shift graph convolutional network. There are three main improvements in this model. Firstly, shift convolution is a lightweight convolution method that can be combined with graph convolution to effectively reduce its complexity. At the same time, a shallow architecture for multi-stream early fusion is designed to reduce the network scale by merging multi-stream networks and reducing the number of network layers. In addition, the efficient channel attention module is introduced into the model to capture the underlying characteristic information in the channel domain. Experiments are conducted on the three existing skeleton datasets, NTU RGB+D, NTU-120 RGB+D, and Northwestern-UCLA. Results demonstrate that the proposed model is not only competitive in accuracy, but also outperforms current mainstream methods in parameter count and computational cost, and supports running in some devices with limited computing and storage resources.

References

R. Y. Lee, T. Y. Chai, S. Y. Chua, Y. L. Lai, Y. W. Sim, and S. C. Haw, "Cashierless checkout vision system for smart retail using deep learning," Journal of System and Management Sciences, vol. 12, no. 4, pp. 232-250, 2022.

D. Lai, S. L. Lew, and S. Y. Ooi, "Mobile interactive system in virtual classroom based on tpack: A study from students' perspectives," Journal of Logistics, Informatics and Service Science, vol. 9, no. 3, pp. 159-171, 2022.

Z. J. Khow, M. K. O. Goh, C. Tee, and C. Y. Law, "A yovo5 based real-time helmet and mask detection system," Journal of Logistics, Informatics and Service Science, vol. 9, no. 3, pp. 97-111, 2022.

Du, Yong and Wang, Wei and Wang, Liang (2015). Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 1110-1118, 2015.

https://doi.org/10.1109/CVPR.2015.7298714

Liu, Jun and Shahroudy, Amir and Xu, Dong and Wang, Gang (2016). Spatio-temporal lstm with trust gates for 3d human action recognition, European conference on computer vision, 816-833, 2016.

https://doi.org/10.1007/978-3-319-46487-9_50

Liu, Jun and Wang, Gang and Hu, Ping and Duan, Ling-Yu and Kot, Alex C (2017). Global context-aware attention lstm networks for 3d action recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 1647-1656, 2017.

https://doi.org/10.1109/CVPR.2017.391

W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, and X. Xie, "Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks," in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.

https://doi.org/10.1609/aaai.v30i1.10451

C. Si, Y. Jing, W. Wang, L. Wang, and T. Tan, "Skeleton-based action recognition with spatial reasoning and temporal stack learning," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 103-118.

https://doi.org/10.1007/978-3-030-01246-5_7

Lee, Inwoong and Kim, Doyoung and Kang, Seoungyoon and Lee, Sanghoon (2017). Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks, Proceedings of the IEEE international conference on computer vision, 1012-1020, 2017.

https://doi.org/10.1109/ICCV.2017.115

Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," in 2015 3rd IAPR Asian conference on pattern recognition (ACPR). IEEE, 2015, pp. 579-583.

https://doi.org/10.1109/ACPR.2015.7486569

P. Wang, Z. Li, Y. Hou, and W. Li, "Action recognition based on joint trajectory maps using convolutional neural networks," in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 102-106.

https://doi.org/10.1145/2964284.2967191

Soo Kim, Tae and Reiter, Austin (2017). Interpretable 3d human action analysis with temporal convolutional networks, Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 20-28, 2017.

https://doi.org/10.1109/CVPRW.2017.207

Liu, Mengyuan and Liu, Hong and Chen, Chen (2017). Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognition, 68, 346-362, 2017.

https://doi.org/10.1016/j.patcog.2017.02.030

Li, Chao and Zhong, Qiaoyong and Xie, Di and Pu, Shiliang (2017). Skeleton-based action recognition with convolutional neural networks, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 597-600, 2017.

https://doi.org/10.1109/ICMEW.2017.8026285

Ke, Qiuhong and Bennamoun, Mohammed and An, Senjian and Sohel, Ferdous and Boussaid, Farid (2018). Learning clip representations for skeleton-based 3d action recognition, IEEE Transactions on Image Processing, 27(6), 2842-2855, 2018.

https://doi.org/10.1109/TIP.2018.2812099

Yan, Sijie and Xiong, Yuanjun and Lin, Dahua (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition, Thirty-second AAAI conference on artificial intelligence.

https://doi.org/10.1609/aaai.v32i1.12328

Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi (2019). Actional-structural graph convolutional networks for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3595-3603, 2019.

https://doi.org/10.1109/CVPR.2019.00371

Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12026-12035, 2019.

https://doi.org/10.1109/CVPR.2019.01230

Si, Chenyang and Chen, Wentao and Wang, Wei and Wang, Liang and Tan, Tieniu (2019). An attention enhanced graph convolutional lstm network for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1227-1236, 2019.

https://doi.org/10.1109/CVPR.2019.00132

Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing (2019). Skeleton-based action recognition with directed graph neural networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7912-7921, 2019.

https://doi.org/10.1109/CVPR.2019.00810

Zhang, Xikun and Xu, Chang and Tao, Dacheng (2020). Context aware graph convolution for skeleton-based action recognition, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14333-14342, 2020.

https://doi.org/10.1109/CVPR42600.2020.01434

Peng, Wei and Hong, Xiaopeng and Chen, Haoyu and Zhao, Guoying (2020). Learning graph convolutional network for skeleton-based human action recognition by neural searching, Proceedings of the AAAI Conference on Artificial Intelligence, 34(3), 2669-2676, 2020.

https://doi.org/10.1609/aaai.v34i03.5652

Plizzari, Chiara and Cannici, Marco and Matteucci, Matteo (2021). Skeleton-based action recognition via spatial and temporal transformer networks, Computer Vision and Image Understanding, 208, 103219, 2021.

https://doi.org/10.1016/j.cviu.2021.103219

Huang, Linjiang and Huang, Yan and Ouyang, Wanli and Wang, Liang (2020). Part-level graph convolutional network for skeleton-based action recognition, Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 11045-11052, 2020.

https://doi.org/10.1609/aaai.v34i07.6759

Song, Yi-Fan and Zhang, Zhang and Shan, Caifeng and Wang, Liang (2020). Richly activated graph convolutional network for robust skeleton-based action recognition, IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1915-1925, 2020.

https://doi.org/10.1109/TCSVT.2020.3015051

Y.-H.Wen, L. Gao, H. Fu, F.-L. Zhang, and S. Xia, "Graph cnns with motif and variable temporal block for skeleton-based action recognition," in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8989-8996.

https://doi.org/10.1609/aaai.v33i01.33018989

L. Shi, Y. Zhang, J. Cheng, and H. Lu, "Skeleton-based action recognition with multi-stream adaptive graph convolutional networks," IEEE Transactions on Image Processing, vol. 29, pp. 9532-9545, 2020.

https://doi.org/10.1109/TIP.2020.3028207

Zhang, Pengfei and Lan, Cuiling and Zeng, Wenjun and Xing, Junliang and Xue, Jianru and Zheng, Nanning (2020). Semantics-guided neural networks for efficient skeleton-based human action recognition, proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1112-1121, 2020.

https://doi.org/10.1109/CVPR42600.2020.00119

Cheng, Ke and Zhang, Yifan and He, Xiangyu and Chen, Weihan and Cheng, Jian and Lu, Hanqing (2020). Skeleton-based action recognition with shift graph convolutional network, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 183-192, 2020.

https://doi.org/10.1109/CVPR42600.2020.00026

Song, Yi-Fan and Zhang, Zhang and Shan, Caifeng and Wang, Liang (2020). Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition, proceedings of the 28th ACM international conference on multimedia, 1625-1633, 2020.

https://doi.org/10.1145/3394171.3413802

Sun, Ning and Leng, Ling and Liu, Jixin and Han, Guang (2021). Multi-stream slowFast graph convolutional networks for skeleton-based action recognition, Image and Vision Computing, 109, 104141, 2021.

https://doi.org/10.1016/j.imavis.2021.104141