Education, Science, Technology, Innovation and Life
Open Access
Sign In

DETR 3D Object Detection Method Based on Fusion of Depth and Salient Information

Download as PDF

DOI: 10.23977/jeis.2023.080102 | Downloads: 21 | Views: 606

Author(s)

Yonggui Wang 1, Jian Li 1, Zaicheng Zhang 1, Bin He 2

Affiliation(s)

1 School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, Shaanxi, 710016, China
2 School of Electronics and Information Engineering, Tongji University, Shanghai, 200000, China

Corresponding Author

Yonggui Wang

ABSTRACT

Most of the existing monocular 3D object detection algorithms combine geometric relationships and convolutional neural networks to predict the 3D attributes of the object, lacking depth feature information and global relationship of features. To solve these problems, a DETR monocular 3D object detection algorithm combining depth and salient information is proposed. A lightweight unsupervised depth module is constructed to extract object depth feature information, and Transformer model is introduced to obtain the global relationship of features. In addition, aiming at the high computational cost of Transformer model in the algorithm, a remarkable network is designed to reduce the computational load of Transformer encoder. The experimental results in KITTI official dataset show that the proposed algorithm achieves the optimal detection accuracy in multiple indicators compared with other current advanced detection algorithms, and the effectiveness of each module in the algorithm is proved through ablation experiments. 

KEYWORDS

Monocular 3D Object Detection, Depth Module, Global Relationship of Features, Transformer, Saliency Network

CITE THIS PAPER

Yonggui Wang, Jian Li, Zaicheng Zhang, Bin He, DETR 3D Object Detection Method Based on Fusion of Depth and Salient Information. Journal of Electronics and Information Science (2023) Vol. 8: 9-19. DOI: http://dx.doi.org/10.23977/10.23977/jeis.2023.080102.

REFERENCES

[1] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In ECCV, 2020.
[2] Li B, Ouyang W, Sheng L, et al. GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving[J]. 2019.
[3] Brazil G., Liu X: M3D-RPN: Monocular 3D region proposal network for object detection. In: ICCV, 2019.
[4] Liu Z, Zhou D, Lu F, et al. Autoshape: Real-time shape-aware monocular 3d object detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15641-15650.
[5] Park D, Ambrus R, Guizilini V, et al. Is pseudo-lidar needed for monocular 3d object detection?[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3142-3152.
[6] Li Y, Chen Y, He J, et al. Densely Constrained Depth Estimator for Monocular 3D Object Detection[C]. European Conference on Computer Vision. Springer, Cham, 2022: 718-734.
[7] Zhu X, Su W, Lu L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.
[8] Wang Y, Guizilini V C, Zhang T, et al. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries[C]. Conference on Robot Learning. PMLR, 2022: 180-191.
[9] Zhang R, Qiu H, Wang T, et al. Monodetr: Depth-aware transformer for monocular 3d object detection[J]. arXiv preprint arXiv:2203.13310, 2022.
[10] Huang K C, Wu T H, Su H T, et al. MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4012- 4021.
[11] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[12] Chen X, Wang Y, Chen X, et al. S2r-depthnet: Learning a generalizable depth-specific structural representation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 3034-3043.
[13] Roh B, Shin J W, Shin W, et al. Sparse detr: Efficient end-to-end object detection with learnable sparsity[J]. arXiv preprint arXiv:2111.14330, 2021.
[14] Ma X, Zhang Y, Xu D, et al. Delving into localization errors for monocular 3d object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4721-4730.
[15] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]. 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012: 3354-3361.
[16] Li P, Zhao H, Liu P, et al. Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving[C]. European Conference on Computer Vision. Springer, Cham, 2020: 644-660.
[17] Zhang Y, Lu J, Zhou J. Objects are different: Flexible monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 3289-3298.
[18] Kumar A, Brazil G, Liu X. Groomed-nms: Grouped mathematically differentiable nms for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 8973- 8983.
[19] Liu Y, Yixuan Y, Liu M. Ground-aware monocular 3d object detection for autonomous driving[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 919-926.
[20] Lu Y, Ma X, Yang L, et al. Geometry uncertainty projection network for monocular 3d object detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3111-3121.
[21] Kumar A, Brazil G, Corona E, et al. Deviant: Depth equivariant network for monocular 3d object detection[C]. European Conference on Computer Vision. Springer, Cham, 2022: 664-683. 1989-07-26 (in Chinese).

Downloads: 9416
Visits: 314008

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.