Multi-scale Self-Attention Convolutional Networks for Skeleton-Based Action Recognition
DOI: 10.23977/acss.2025.090212 | Downloads: 7 | Views: 433
Author(s)
Yuwen Fang 1, Zonghui Wang 1
Affiliation(s)
1 School of Computer and Information Sciences, Chongqing Normal University, Chongqing, China
Corresponding Author
Yuwen FangABSTRACT
Skeleton-based action recognition is one of the core tasks in the field of video understanding and is widely used in scenarios such as human-computer interaction, intelligent monitoring, and sports analysis. Existing graph convolutional networks (GCNs) effectively model the spatial dependency of joints by constructing a skeletal connection graph, but their temporal modeling usually relies on fixed-window temporal convolution, which makes it difficult to capture the global dynamic associations between distant frames, resulting in the loss of key temporal features in complex actions. To this end, this paper proposes a feature extraction framework based on temporal context enhancement. First, the framework uses GCN to explicitly encode the spatial dependency of skeletal joints and extract spatial features containing physical connection priors; secondly, the local temporal dynamics between adjacent frames are captured through a multi-scale temporal convolution module; on this basis, the self-attention mechanism of the temporal dimension is introduced to model the cross-frame association of the feature sequence output by the temporal convolution, and the key dependencies between distant action frames are adaptively captured through dynamic weight allocation, realizing temporal modeling from local to global. Experimental results on the NTU RGB+D dataset show that the proposed method significantly outperforms the existing advanced models in the task of skeletal action recognition, verifying the effectiveness of the temporal self-attention mechanism in modeling complex action dynamics.
KEYWORDS
Skeletal action recognition; graph convolutional network; temporal self-attention mechanism; multi-scale temporal convolution; spatiotemporal modelingCITE THIS PAPER
Yuwen Fang, Zonghui Wang, Multi-scale Self-Attention Convolutional Networks for Skeleton-Based Action Recognition. Advances in Computer, Signals and Systems (2025) Vol. 9: 99-107. DOI: http://dx.doi.org/10.23977/acss.2025.090212.
REFERENCES
[1] Zhou Y, Yan X, Cheng Z Q, et al. Blockgcn: Redefine topology awareness for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 2049-2058.
[2] Myung W, Su N, Xue J H, et al. Degcn: Deformable graph convolutional networks for skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2024, 33: 2477-2490.
[3] Qin X, Cai R, Yu J, et al. An efficient self-attention network for skeleton-based action recognition[J]. Scientific Reports, 2022, 12(1): 4111.
[4] Wang Q, Shi S, He J, et al. Iip-transformer: Intra-inter-part transformer for skeleton-based action recognition[C]//2023 IEEE International Conference on Big Data (BigData). IEEE, 2023: 936-945.
[5] Shi F, Lee C, Qiu L, et al. Star: Sparse transformer-based action recognition[J]. arXiv preprint arXiv:2107.07089, 2021.
[6] Choi J, Wi H, Kim J, et al. Graph convolutions enrich the self-attention in transformers![J]. Advances in Neural Information Processing Systems, 2024, 37: 52891-52936.
[7] Pang Y, Ke Q, Rahmani H, et al. Igformer: Interaction graph transformer for skeleton-based human interaction recognition[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 605-622.
[8] Shi F, Lee C, Qiu L, et al. Star: Sparse transformer-based action recognition[J]. arXiv preprint arXiv:2107.07089, 2021.
[9] Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv preprint arXiv:1801.07455, 2018.
[10] Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. Two-stream adaptive graph convolutional networks for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12026–12035, 2019.
[11] Cheng K, Zhang Y F, He X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society Press, 2020: 180-189
[12] Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, and Wanli Ouyang. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 143–152, 2020.
[13] C. Plizzari, M. Cannici, M. Matteucci, Skeleton-based action recognition via spatial and temporal transformer networks, Comput. Vis. Image Underst. 208-209 (2021) 103219
[14] Fanfan Ye, Shiliang Pu, Qiaoyong Zhong, Chao Li, Di Xie, and Huiming Tang. Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In Proceedings ofthe 28th ACM International Conference on Multimedia, pages 55–63, 2020.
[15] L. Shi, Y. Zhang, J. Cheng, H. Lu, Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition, in: Revised Selected Papers of the Asian Conf. on Computer Vision (ACCV'20), Part V, Springer, Cham, Switzerland, 2020, pp. 3853.
[16] Z. Chen, S. Li, B. Yang, Q. Li, H. Liu, Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition, in: AAAI Conf. on Arti cial Intelligence (AAAI'21), IAAI'21, EAAI'21, AAAI, RedHook, NY, USA, 2021, pp. 11131122.
Downloads: | 38553 |
---|---|
Visits: | 697949 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks