Education, Science, Technology, Innovation and Life
Open Access
Sign In

Underwater Monocular-continuous Stereo Network Based on Cascade Structure for Underwater Image Depth Estimation

Download as PDF

DOI: 10.23977/jeis.2025.100101 | Downloads: 16 | Views: 684

Author(s)

Yao Haiyang 1, Zeng Yiwen 1, Zang Yuzhang 2, Lei Tao 1, Zhao Xiaobo 3, Chen Xiao 1, Wang Haiyan 1,4

Affiliation(s)

1 School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an, 710016, China
2 Engineering and Design Department, Western Washington University, Bellingham, WA, USA
3 Department of Electrical and Computer Engineering, Aarhus University, Aarhus, 8200, Denmark
4 School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, 710072, China

Corresponding Author

Yao Haiyang

ABSTRACT

Underwater monocular image depth estimation (UMIDE) is crucial accurately representing and understanding underwater spatial variations, which can significantly enhance applications such as ocean engineering construction and seabed resource exploration. However, UMIDE frequently suffers from isolated discontinuous irregular "spots", inaccurate or indistinguishable edges, and limited model generalization, resulting from color distortion, image blurring, and spatial information loss. This paper proposes an underwater Monocular-continuous stereo network based on a cascade structure (UMCS-CS). Initially, we design a Pinhole model-based Structure from Motion method for camera pose estimation. UMCS-CS employs a two-stage structure for feature extraction: the first stage extracts global information, and the second stage captures detailed information using the squeeze–excitation block with spatial and channel attention. For isolated, discontinuous, and irregular "spots", we use the variance of the current depth estimation to adjust and appropriately expand the depth estimation range. We design a composite loss function, which is a combination of the smooth L1 loss, edge loss function, structural similarity loss, and smoothness loss functions, each with different weights. Experiments on public underwater datasets show that the relative error of the estimated depth map is reduced by 60.83%, the root mean square error by 54.87%, and the logarithmic error by 39.61%. 

KEYWORDS

Underwater monocular images, underwater depth estimation, ocean engineering, deep learning

CITE THIS PAPER

Yao Haiyang, Zeng Yiwen, Zang Yuzhang, Lei Tao, Zhao Xiaobo, Chen Xiao, Wang Haiyan, Underwater Monocular-continuous Stereo Network Based on Cascade Structure for Underwater Image Depth Estimation. Journal of Electronics and Information Science (2025) Vol. 10: 1-14. DOI: http://dx.doi.org/10.23977/10.23977/jeis.2025.100101.

REFERENCES

[1] Zhang, S., Zhao, S., An, D., Liu, J., Wang, H., Feng, Y., Li, D., Zhao, R. Visual SLAM for underwater vehicles: A survey[J]. Comput. Sci. Rev., 2022, 46:100510.
[2] Hu, Q., Zhu, H., Yu, M., Fan, Z., Zhang, W., Liu, X., Li, Z. A novel 3D detection system with target keypoint estimation for underwater pipelines[J]. Ocean Engineering, 2024, 309:118319.
[3] Zhang, X., Bian, X., Yan, Z. Underwater Docking of AUV with the Dock and Virtual Simulation[C]. Advanced Materials Research, 2010, 159:371-376.
[4] Sahu, P., Gupta, N., Sharma, N. A Survey on Underwater Image Enhancement Techniques[J]. International Journal of Computer Applications, 2014, 87:19-23.
[5] Bello, O., Zeadally, S. Internet of underwater things communication: Architecture, technologies, research challenges and future opportunities[J]. Ad Hoc Networks, 2022, 135:102933.
[6] Song, W., Wang, Y., Huang, D., Tjondronegoro, D. A Rapid Scene Depth Estimation Model Based on Underwater Light Attenuation Prior for Underwater Image Restoration[C]. PCM 2018, 2018, 678–688.
[7] Berman, D., Levy, D., Avidan, S., Treibitz, T. Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 43:2822-2837.
[8] Chen, Y., Li, Q., Gong, S., Liu, J., Guan, W. UV3D: Underwater Video Stream 3D Reconstruction Based on Efficient Global SFM[J]. Applied Sciences, 2022, 5918.
[9] Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M. Pixelwise View Selection for Unstructured Multi-View Stereo[C]. ECCV 2016, 2016, 501–518.
[10] Romanoni, A., Matteucci, M. TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo[C]. IEEE/CVF ICCV 2019, 2019, 10412-10421.
[11] Zhu, Z., Li, X., Wang, Z., He, L., He, B., Xia, S. Development and Research of a Multi-Medium Motion Capture System for Underwater Intelligent Agents[J]. Applied Sciences, 2020, 10:6237.
[12] Levy, D., Peleg, A., Pearl, N., Rosenbaum, D., Akkaynak, D., Korman, S., Treibitz, T. SeaThru-NeRF: Neural Radiance Fields in Scattering Media[C]. IEEE/CVF CVPR, 2023, 56–65.
[13] Ye, X., Zhang, J., Yuan, Y., Xu, R., Wang, Z., Li, H. Underwater Depth Estimation via Stereo Adaptation Networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33:5089–5101.
[14] Li, K., Wang, X., Liu, W., Qi, Q., Hou, G., Zhang, Z., Sun, K. Learning Scribbles for Dense Depth: Weakly Supervised Single Underwater Image Depth Estimation Boosted by Multitask Learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62:1–15.
[15] Gupta, H., Mitra, K. Unsupervised Single Image Underwater Depth Estimation[C]. IEEE ICIP, 2019, 624–628.
[16] Nagamatsu, G., Takamatsu, J., Iwaguchi, T., Thomas, D.G., Kawasaki, H. Self-calibrated dense 3D sensor using multiple cross line-lasers based on light sectioning method and visual odometry[C]. IEEE/RSJ IROS, 2021, 94–100.
[17] Zhang, Z., Peng, R., Hu, Y., Wang, R. GeoMVSNet: Learning Multi-View Stereo with Geometry Perception[J]. IEEE/CVF CVPR, 2023, 21508–21518.
[18] Qi, X., Liu, Z., Liao, R., Torr, P.H.S., Urtasun, R., Jia, J. GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 969–984.
[19] Marques, F.M., Castro, F., Parente, M., Costa, P. A Hybrid Framework for Uncertainty-Aware Depth Prediction in the Underwater Environment[J]. IEEE ICARSC, 2020, 102–107.
[20] Li, Q., Wang, H., Xiao, Y., Yang, H., Chi, Z., Dai, D. Underwater Unsupervised Stereo Matching Method Based on Semantic Attention[J]. J. Mar. Sci. Eng., 2024, 12:1123.
[21] Ebner, L., Billings, G., Williams, S. Metrically Scaled Monocular Depth Estimation through Sparse Priors for Underwater Robots[C]. ICRA 2024, Yokohama, Japan.
[22] Ye, X., Li, Z., Sun, B., Wang, Z., Xu, R., Li, H., Fan, X. Deep Joint Depth Estimation and Color Correction From Monocular Underwater Images Based on Unsupervised Adaptation Networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 3995–4008.
[23] Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., Tan, P. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching[C]. IEEE/CVF CVPR, 2019, 2492–2501.
[24] Sturm, P. Pinhole Camera Model[M]. In Computer Vision: A Reference Guide. Springer International Publishing, 2021, 983-986.
[25] Yao, Y., Luo, Z., Li, S., Zhang, J., Ren, Y., Zhou, L., Fang, T., Quan, L. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks[C]. IEEE/CVF CVPR, 2019, 1787–1796.
[26] Randall, Y., Treibitz, T. FLSea: Underwater Visual-Inertial and Stereo-Vision Forward-Looking Datasets[J]. The International Journal of Robotics Research, 2023.
[27] Arnaubec, A., Raugel, E. Torpedo boat wreck (Mediterranean, 43.124N;6.523E): Imagery and 3D model[J]. SEANOE, 2021. 
[28] Matabos, M., Arnaubec, A. Eiffel Tower hydrothermal chimney (Lucky Srike Hydrothermal Field, Mid Atlantic Ridge): 3D scene and imagery[J]. SEANOE, 2015.
[29] Arnaubec, A., Escartin, J. Submarine fault scarp and traces of earthquake (Roseau Fault, French Antilles): 3D scene and imagery[J]. SEANOE, 2017.
[30] Arnaubec, A., Raugel, E. Seafloor litter field off the French Mediterranean coast (43.078°N;6.458°E): 3D scene and imagery[J]. SEANOE, 2017.
[31] Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M. PatchmatchNet: Learned Multi-View Patchmatch Stereo[C]. IEEE/CVF CVPR, 2020, 14189–14198.

Downloads: 12710
Visits: 493280

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.