Education, Science, Technology, Innovation and Life
Open Access
Sign In

Efficient multi-scale traffic object detection method based on RT-DETR

Download as PDF

DOI: 10.23977/acss.2024.080702 | Downloads: 21 | Views: 315

Author(s)

Songnan Zhang 1, Xiang Peng 1

Affiliation(s)

1 School of Information and Electronic Technology, Key Laboratory of Autonomous Intelligence and Information Processing in Heilongjiang Province, Jiamusi University, Jiamusi, China

Corresponding Author

Songnan Zhang

ABSTRACT

Traffic object detection is a crucial technological application with significant development potential. To address the limitations of current methods in multi-scale object detection, this paper introduces an Efficient Multi-scale Traffic Object Detection method based on RT-DETR. Specifically, we have designed an Efficient Multi-scale Network that incorporates Multi-head Mixed Convolution (MMC), Multi-scale Aggregation (MA), and an Efficient Multi-scale Module (EMM). This method integrates convolutional techniques with transformers to minimize the computational overhead of the model while enhancing the effectiveness of multi-scale detection. Experimental results demonstrate that, compared to the original method, the Average Precision (AP) and the Small Object Average Precision (APs) of our method have improved by 1.2% and 1.1%, respectively, indicating a notable advantage over similar approaches.

KEYWORDS

Object detection, Transformer, Multi-scale network, Attention mechanism

CITE THIS PAPER

Songnan Zhang, Xiang Peng, Efficient multi-scale traffic object detection method based on RT-DETR. Advances in Computer, Signals and Systems (2024) Vol. 8: 12-18. DOI: http://dx.doi.org/10.23977/acss.2024.080702.

REFERENCES

[1] Wang X, Shrivastava A, Gupta A. A-fast-rcnn: Hard positive generation via adversary for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2606-2615.
[2] Bodla N, Singh B, Chellappa R, et al. Soft-NMS--improving object detection with one line of code[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5561-5569.
[3] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arXiv preprint arXiv:2405.14458, 2024. 
[4] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.
[5] Wang Y, Zhang X, Yang T, et al. Anchor detr: Query design for transformer-based detector[C]//Proceedings of the AAAI conference on artificial intelligence. 2022, 36(3): 2567-2575.
[6] Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.
[7] Wang T, Yuan L, Chen Y, et al. Pnp-detr: Towards efficient visual analysis with transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 4661-4670.
[8] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv preprint arXiv:2203.03605, 2022.
[9] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 16965-16974.
[10] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arxiv preprint arxiv:2010.11929, 2020.
[11] Touvron H, Cord M, Jégou H. Deit iii: Revenge of the vit[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 516-533.
[12] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[13] Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567.
[14] Tu Z, Talebi H, Zhang H, et al. Maxvit: Multi-axis vision transformer[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 459-479.
[15] Liu S, Li F, Zhang H, et al. Dab-detr: Dynamic anchor boxes are better queries for detr[J]. arXiv preprint arXiv:2201.12329, 2022.
[16] Li F, Zhang H, Liu S, et al. Dn-detr: Accelerate detr training by introducing query denoising[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 13619-13627.
[17] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021.
[18] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in Neural Information Processing Systems, 2022, 35: 12934-12949.
[19] Lin W, Wu Z, Chen J, et al. Scale-aware modulation meet transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 6015-6026.
[20] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475.
[21] Shen L, Lang B, Song Z. Infrared object detection method based on DBD-YOLOv8 [J]. IEEE Access, 2023.

Downloads: 27683
Visits: 469048

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.