Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information

Kui Tang; Lingfei Cheng; Huan Zhang

doi:10.23977/jipta.2025.080115

Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information

Download as PDF

DOI: 10.23977/jipta.2025.080115 | Downloads: 19 | Views: 1544

Author(s)

Kui Tang ¹, Lingfei Cheng ¹, Huan Zhang ¹

Affiliation(s)

¹ School of Physics and Electronic Information, Henan Polytechnic University, Jiaozuo, Henan, China

Corresponding Author

Lingfei Cheng

ABSTRACT

Existing algorithms fail to fully utilize the rich semantic and spatial information contained in images, leading to inaccurate pixel segmentation across different categories and severe loss of detail. We propose a semantic segmentation network that focuses on multi-scale spatial information (Focusing on Multi-Scale Spatial Information for Semantic Segmentation Networks, FMSI-DeepLab). Based on the DeepLabv3+ framework, the network is improved in two main parts. In the encoder, deformable convolutions are combined with the Global Grouped Coordinate Attention (GGCA) mechanism to reconstruct the Atrous Spatial Pyramid Pooling (ASPP) module, enhancing the model's ability to capture global information across both height and width spatial dimensions, thereby enabling efficient multi-scale feature extraction. In the decoder part, "interest flow" processing is added to the low-level features, enabling them to have global connectivity at the low-level stage. Subsequently, the Multi-Scale Channel Spatial Enhanced Attention (MSEA) module is introduced to further enhance the model's focus on the edge information of the low-level features extracted by the backbone network, thereby strengthening the model's emphasis on details. Compared to the original DeepLabv3+ semantic segmentation model, the model achieves a 2.62% improvement in average intersection-over-union (mIoU) on the VOC2012 dataset, addressing issues of inaccurate image segmentation and severe loss of details.

KEYWORDS

Deep Learning; Semantic Segmentation; Multi-Scale Features; Global Attention Mechanism

CITE THIS PAPER

Kui Tang, Lingfei Cheng, Huan Zhang, Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information. Journal of Image Processing Theory and Applications (2025) Vol. 8: 122-131. DOI: http://dx.doi.org/10.23977/jipta.2025.080115.

REFERENCES

[1] L. Wang and Y. Huang, "A Survey of 3D Point Cloud and Deep Learning-Based Approaches for Scene Understanding in Autonomous Driving," in IEEE Intelligent Transportation Systems Magazine, vol. 14, no. 6, pp. 135-154, Nov.-Dec. 2022, doi: 10.1109 MITS.2021.3109041.
[2] Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021.
[3] Luo Z, Yang W, Yuan Y, et al. Semantic segmentation of agricultural images: A survey[J]. Information Processing in Agriculture, 2024, 11(2): 172-186.
[4] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[5] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation. [J]" arXiv preprint arXiv: 1706.05587 (2017).
[6] Azad R, Heidari M, Shariatnia M, et al. Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation[C]//International Workshop on PRedictive Intelligence in MEdicine. Cham: Springer Nature Switzerland, 2022: 91-102.
[7] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919.
[8] Ouyang D, He S, Zhang G, et al. Efficient multi-scale attention module with cross-spatial learning[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-5.
[9] Jiao J, Tang Y M, Lin K Y, et al. Dilateformer: Multi-scale dilated transformer for visual recognition[J]. IEEE Transactions on Multimedia, 2023, 25: 8906-8919.
[10] Si Y, Xu H, Zhu X, et al. SCSA: Exploring the synergistic effects between spatial and channel attention[J]. Neurocomputing, 2025, 634: 129866.
[11] Li H, Zhang R, Pan Y, et al. Lr-fpn: Enhancing remote sensing object detection with location refined feature pyramid network[C]//2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024: 1-8.
[12] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C] //Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer international publishing, 2015: 234-241.
[13] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.
[14] Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2020, 43(10): 3349-3364.
[15] Cao H, Wang Y, Chen J, et al. Swin-unet: Unet-like pure transformer for medical image segmentation[C] //European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 205-218.

Subscription

E-Mail Alert

Downloads:	3172
Visits:	262036

Optimization of the DeepLabv3+ Segmentation Network by Integrating Multi-Scale Spatial Information

Author(s)

Affiliation(s)

Corresponding Author

ABSTRACT

KEYWORDS

CITE THIS PAPER

REFERENCES

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US