Efficient multi-scale traffic object detection method based on RT-DETR
DOI: 10.23977/acss.2024.080702 | Downloads: 21 | Views: 315
Author(s)
Songnan Zhang 1, Xiang Peng 1
Affiliation(s)
1 School of Information and Electronic Technology, Key Laboratory of Autonomous Intelligence and Information Processing in Heilongjiang Province, Jiamusi University, Jiamusi, China
Corresponding Author
Songnan ZhangABSTRACT
Traffic object detection is a crucial technological application with significant development potential. To address the limitations of current methods in multi-scale object detection, this paper introduces an Efficient Multi-scale Traffic Object Detection method based on RT-DETR. Specifically, we have designed an Efficient Multi-scale Network that incorporates Multi-head Mixed Convolution (MMC), Multi-scale Aggregation (MA), and an Efficient Multi-scale Module (EMM). This method integrates convolutional techniques with transformers to minimize the computational overhead of the model while enhancing the effectiveness of multi-scale detection. Experimental results demonstrate that, compared to the original method, the Average Precision (AP) and the Small Object Average Precision (APs) of our method have improved by 1.2% and 1.1%, respectively, indicating a notable advantage over similar approaches.
KEYWORDS
Object detection, Transformer, Multi-scale network, Attention mechanismCITE THIS PAPER
Songnan Zhang, Xiang Peng, Efficient multi-scale traffic object detection method based on RT-DETR. Advances in Computer, Signals and Systems (2024) Vol. 8: 12-18. DOI: http://dx.doi.org/10.23977/acss.2024.080702.
REFERENCES
[1] Wang X, Shrivastava A, Gupta A. A-fast-rcnn: Hard positive generation via adversary for object detection[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2606-2615.
[2] Bodla N, Singh B, Chellappa R, et al. Soft-NMS--improving object detection with one line of code[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5561-5569.
[3] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arXiv preprint arXiv:2405.14458, 2024.
[4] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.
[5] Wang Y, Zhang X, Yang T, et al. Anchor detr: Query design for transformer-based detector[C]//Proceedings of the AAAI conference on artificial intelligence. 2022, 36(3): 2567-2575.
[6] Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.
[7] Wang T, Yuan L, Chen Y, et al. Pnp-detr: Towards efficient visual analysis with transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 4661-4670.
[8] Zhang H, Li F, Liu S, et al. Dino: Detr with improved denoising anchor boxes for end-to-end object detection[J]. arXiv preprint arXiv:2203.03605, 2022.
[9] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 16965-16974.
[10] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arxiv preprint arxiv:2010.11929, 2020.
[11] Touvron H, Cord M, Jégou H. Deit iii: Revenge of the vit[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 516-533.
[12] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[13] Yuan L, Chen Y, Wang T, et al. Tokens-to-token vit: Training vision transformers from scratch on imagenet[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 558-567.
[14] Tu Z, Talebi H, Zhang H, et al. Maxvit: Multi-axis vision transformer[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 459-479.
[15] Liu S, Li F, Zhang H, et al. Dab-detr: Dynamic anchor boxes are better queries for detr[J]. arXiv preprint arXiv:2201.12329, 2022.
[16] Li F, Zhang H, Liu S, et al. Dn-detr: Accelerate detr training by introducing query denoising[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 13619-13627.
[17] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021.
[18] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in Neural Information Processing Systems, 2022, 35: 12934-12949.
[19] Lin W, Wu Z, Chen J, et al. Scale-aware modulation meet transformer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 6015-6026.
[20] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7464-7475.
[21] Shen L, Lang B, Song Z. Infrared object detection method based on DBD-YOLOv8 [J]. IEEE Access, 2023.
Downloads: | 27683 |
---|---|
Visits: | 469048 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks