Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information
DOI: 10.23977/jipta.2025.080115 | Downloads: 3 | Views: 230
Author(s)
Kui Tang 1, Lingfei Cheng 1, Huan Zhang 1
Affiliation(s)
1 School of Physics and Electronic Information, Henan Polytechnic University, Jiaozuo, Henan, China
Corresponding Author
Lingfei ChengABSTRACT
Existing algorithms fail to fully utilize the rich semantic and spatial information contained in images, leading to inaccurate pixel segmentation across different categories and severe loss of detail. We propose a semantic segmentation network that focuses on multi-scale spatial information (Focusing on Multi-Scale Spatial Information for Semantic Segmentation Networks, FMSI-DeepLab). Based on the DeepLabv3+ framework, the network is improved in two main parts. In the encoder, deformable convolutions are combined with the Global Grouped Coordinate Attention (GGCA) mechanism to reconstruct the Atrous Spatial Pyramid Pooling (ASPP) module, enhancing the model's ability to capture global information across both height and width spatial dimensions, thereby enabling efficient multi-scale feature extraction. In the decoder part, "interest flow" processing is added to the low-level features, enabling them to have global connectivity at the low-level stage. Subsequently, the Multi-Scale Channel Spatial Enhanced Attention (MSEA) module is introduced to further enhance the model's focus on the edge information of the low-level features extracted by the backbone network, thereby strengthening the model's emphasis on details. Compared to the original DeepLabv3+ semantic segmentation model, the model achieves a 2.62% improvement in average intersection-over-union (mIoU) on the VOC2012 dataset, addressing issues of inaccurate image segmentation and severe loss of details.
KEYWORDS
Deep Learning; Semantic Segmentation; Multi-Scale Features; Global Attention MechanismCITE THIS PAPER
Kui Tang, Lingfei Cheng, Huan Zhang, Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information. Journal of Image Processing Theory and Applications (2025) Vol. 8: 122-131. DOI: http://dx.doi.org/10.23977/jipta.2025.080115.
REFERENCES
[1] L. Wang and Y. Huang, "A Survey of 3D Point Cloud and Deep Learning-Based Approaches for Scene Understanding in Autonomous Driving," in IEEE Intelligent Transportation Systems Magazine, vol. 14, no. 6, pp. 135-154, Nov.-Dec. 2022, doi: 10.1109 MITS.2021.3109041.
[2] Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021.
[3] Luo Z, Yang W, Yuan Y, et al. Semantic segmentation of agricultural images: A survey[J]. Information Processing in Agriculture, 2024, 11(2): 172-186.
[4] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[5] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation. [J]" arXiv preprint arXiv: 1706.05587 (2017).
[6] Azad R, Heidari M, Shariatnia M, et al. Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation[C]//International Workshop on PRedictive Intelligence in MEdicine. Cham: Springer Nature Switzerland, 2022: 91-102.
[7] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
[8] Ouyang D, He S, Zhang G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-5.
[9] Jiao J, Tang Y M, Lin K Y, et al. Dilateformer: Multi-scale dilated transformer for visual recognition[J]. IEEE Transactions on Multimedia, 2023, 25: 8906-8919.
[10] Si Y, Xu H, Zhu X, et al. SCSA: Exploring the synergistic effects between spatial and channel attention[J]. Neurocomputing, 2025, 634: 129866.
[11] Li H, Zhang R, Pan Y, et al. Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network[C]//2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024: 1-8.
[12] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C] //Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer international publishing, 2015: 234-241.
[13] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.
[14] Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(10): 3349-3364.
[15] Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C] //European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 205-218.
Downloads: | 2455 |
---|---|
Visits: | 172055 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Advances in Computer, Signals and Systems
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks