High-precision Cone Center Point Extraction Method Based on Stereo Vision Feature Matching

: In order to achieve high-precision docking of airborne refueling in close proximity, this paper proposes a high-precision cone center point extraction method based on stereo vision feature matching. Based on the detection and tracking results of the target cone, the method performs extended compensation on the tracking frame of the cone to form a region of interest (ROI). With camera calibration results and feature matching which is performed on the ROI, a depth map of the cone can be obtained. With the ideology of double Gaussian function, the center offset is solved according to the distortion ratio of the cone ring width to locate the cone center point. The method does not need to add any cooperation logo to the cone, and is easier to implement in the engineering application; it is not necessary to fit the cone-shaped circle to solve the center point, which can be directly solved; compared with other methods, the precision is greatly improved. This method has been implemented in ABB robot simulation system. The experimental results successfully verify the feasibility and effectiveness of the method.


Introduction
Automated aerial refueling (AAR) technology has become one of the hotspots in the field of UAV. The application of drones not only reduces the casualties of drivers, but also greatly improves the feasibility of military strikes. The UAV's self-propelled aerial refueling system is divided into two types: hose type and hard tube type [1]. This paper mainly uses hose cone type refueling tube as the research object. The difficult breakthrough of autonomous aerial refueling technology is the accurate measurement of the position and attitude of the cone. In the middle and long distance part of the docking phase, the position measurement of the cone is mainly completed by the multi-information navigation mode such as binocular stereo vision navigation . However, in the close-up docking phase, most of the visual methods are to ensure the spatial positioning accuracy of the target by adding an optical cooperation logo on the oil pipe or the cone. This kind of method is not only complicated to install, but is seriously affected by optics. Once the marker is occluded or blurred, it will cause the cone to be lost or even misdetected. Therefore, unmarked visual measurement systems are particularly important. Most methods without optical cooperation mark replace the center point with the cone tracking frame point or the fitting cone center, but the theoretical center point in the actual docking process should be the oil pipe nozzle in the cone. The point accuracy is much lower than the method with optical cooperation mark [2,3], which results in the docking is limited to the oil pipe and needs to slide into the cone at the right angle to complete the docking. Therefore, it is necessary to propose a method that does not rely on the optical cooperation logo and improve the accuracy of the cone center point, which lays a higher foundation for the subsequent development of the autonomous air refueling technology.
This paper discusses the coordinate solution method of the center point of the target cone. The main contributions are as follows: 1) We propose a compensation mechanism based on cone detection and tracking, and improve the processing speed through image segmentation technology.
2) We propose a feature matching technique based on the ROI, and extract the coordinates of the center point from the double Gaussian function ideology by extracting the depth values of the matching points in the horizontal and vertical directions of the depth map from the generated dense 3D point cloud.
3) We propose to re-model the problem that the partial matching point is too sparse due to the limitation of the feature matching algorithm, and optimize the center point coordinate result according to the neighborhood matching point and the ring prior knowledge to achieve high precision requirements.
The structure of this paper is as follows: First, the proposed method is described in detail. Secondly, the measurement results of the whole method are given through experiments. Finally, the summary and follow-up of the paper are given.

Methodology
The main purpose of this paper is to accurately measure the position of the cone center point during the close-up docking of the autonomous air refueling. Our approach is primarily based on a two-camera stereo vision system: the first phase is target detection, tracking and segmentation. The second stage is to solve the cone center point of the ROI area depth map.The third stage is model optimization.

The First Stage.
Adaboost target detection is an iterative algorithm, which combines multiple weak classifiers together to filter as a strong classifier [4]. Although it is not as accurate as deep learning, its speed is fast enough to guarantees the real-time performance of KCF tracking algorithm, and the framework is more easy to carry out hardware porting [5]. In order to improve the image processing speed, we generally perform different degrees of preprocessing according to the image size, and compress the large image by adaptive coefficients. The compression ratio is defined as (1) Due to the compression, some of the feature information is affected to different extents, which makes the cone recognition process so difficult that the tracking frame area does not cover the complete cone. This can have a serious impact on subsequent feature matching results. To solve the problem, we compensate and expand the tracking frame area according to the two values: compression coefficient and mesh size to form a ROI that covers the complete cone. According to the gray distribution of the actual ROI, the approximate region of the image segmentation is determined by the apparent difference gray value of the pixel on the step edge, and the depth values of the useless regions other than the ring cone are all set to zero. A suitable threshold t divides it out.

The Second Stage
Feature matching is to calculate the similarity between image features to complete the matching [6]. The algorithm architecture is mainly to establish a grid of nine squares, the center point is the matching point to be judged, the grid pattern is rotated in eight directions with an angle°= 45 θ , and the best matching point is selected by the maximum value of the numerical distribution curve [7].
Multiply the pixel coordinates of the matching point with the matrix of the calibration result to obtain a dense three-dimensional point cloud [8]. The schematic diagram is as follows Fig. 1

binocular stereo vision measurement model
We extract the depth value of the dense three-dimensional point cloud to form a depth map, and extract all the depth values of the matching points on the midline in the horizontal direction and the vertical direction, and obtain the following point sets and perform double Gaussian function fitting [9].
Because the cone will be distorted during the image acquisition process, such as the right offset of the cone relative to the double camera, the ring pitch of the left cone is thicker than that of the right cone, and the center point is also closer to the left side, the closer the ratio of the left and right peaks of the double Gaussian function is inversely proportional to the distance between the center point and the left and right side rings. According to the span ratio, the ring pixel diameter and the pixel coordinates of the upper-left corner of the ROI, the coordinates of the center point can be solved by the following formula.
where n X and n Y are the inner and outer ring scatter coordinate values at both ends of the horizontal and vertical , center X and center Y are cone center point coordinate value, box X and box Y are ROI box upper-left coordinate value.

The Third Stage
Since feature matching is easily affected by the background and texture. Matching points are sparse in some place. This situation will result in a decrease in the accuracy of the center point position, or even an error. We can extract the matching points of the dense cone near the center line.
We know that the edge gradient change of the elliptical ring is regular. The concave function, although the gradient is a variation, the derivative of the absolute value of the gradient is a fixed value. From this we can find the gradient value near the midline, and perform the region fitting to compensate the sparse point set. According to the image sequence of the cone in the simulation environment, we select the horizontal direction neighborhood value 3 = range X , and the vertical direction domain value 9 = range Y . At the same time, we can select the matching point closest to the outer side of the inner and outer ring edges as the Gaussian peak and valley junction point, and continue to solve the center point coordinates according to the second stage algorithm. Because the third stage uses the combination of gradient feature constraints and geometric feature constraints to reconstruct the region matching points, the non-mean sub-pixel points are inserted between adjacent pixels. The new model can improve the accuracy to the sub-pixel level through the second-stage algorithm. The results of the accuracy verification will be shown in the experiment below.

Physical Experiment
Based on the ABB robot IRB1410 dual-arm simulation environment, the robot arm repositioning accuracy can reach 0.02mm. This chapter will expand from three aspects: experimental environmental conditions, experimental level renderings and contrast accuracy evaluation.

Experimental Environmental Conditions
The range of motion of the IRB1410 robot arm in the simulation system is ±20cm in the X direction, ±40cm in the Y direction, and ±10cm in the Z direction.

Experimental Level Renderings
The first level of the algorithm in this paper is to detect and track each frame of the video stream collected by the dual camera module, and display the position of the tracking frame in real time on the image. In the original tracking frame, the partial cone ring will appear outside the tracking frame. According to the displacement direction and displacement of the cone, the position and overflow of the overflow portion of the cone will be different. After the compensation mechanism algorithm is modified, the general extended pixel range: pixels grid scale resize range 10 _ ≈ × = , and the speed can be increased according to the constant value of the specific measured object. The area of the tracking frame after local expansion is the ROI area effect as follows The second level of the algorithm is to match the ROI area of the left and right camera images. In the matching process, we can adjust the number of matching points to 1000 according to the characteristic information such as the center symmetry of the cone and the black and white phase. We will get the 2D matching point map into a dense 3D point cloud through the calibration result, and extract the depth information to generate the depth map, as shown below. Red dot is mark point, which represents true value :(1108, 246).Yellow dot is measured point, which represents measured value:(1108.9,246.4)

Accuracy Evaluation
In order to evaluate the degree of accuracy improvement of the proposed method in this paper, we use the robotic arm to perform verification experiments in the horizontal and vertical directions. The number of experimental images in each group is 60-150 frames, and the frame is divided according to the number of frames. Sampling calculation, about 10 sets of images are sampled in a group of experiments, a total of 10 sets of experiments and 100 pictures. Through the control variable method, the other variable factors of the fixed cone, such as the cone without rotation, the plane of the cone (ie, the posture) are consistent. It can be seen from the experimental data that the larger the displacement, the greater the accuracy improvement; the more positive the camera, the smaller the accuracy improvement. According to the results of the 100 sets of experimental data, the pixels RMS 47 . 0 = of the vertical direction error and the pixels RMS 78 . 0 = of the horizontal direction error are orders of magnitude higher than the accuracy of the tracking frame center method and the fitted center method.

Conclusion
In this paper, we propose a high-precision cone center point extraction method based on stereo vision feature matching. Firstly, according to the traditional target detection and tracking result, the compensation algorithm is used to obtain the region of interest and perform image segmentation. Then, feature matching is performed on the region of interest, and the conversion of the dense three-dimensional point cloud to the depth map is completed. Then, the two-way midline neighborhood point fitting is performed according to the defect of the depth map and the prior knowledge of the geometric characteristics of the cone ring. Finally, according to the new fitting model, the double-Gaussian function is applied to the depth value distribution of the two-way midline matching point to solve the central point offset, and the coordinates of the center point position are calculated. The experimental results have proved the improvement of the accuracy of the method.
At the same time, the system built by the method can be installed on the oil receiving machine of the self-propelled aerial refueling, and can accurately position the oil pipe nozzle in the cone in the short-distance docking stage, instead of merely staying in the positioning cone ring, which is more reliable and safe. Complete the docking process. In addition, it can be applied to the tank to the receiving port in the process of rigid pipe autonomous aerial refueling. In the future, we can further improve the speed of the software side in the detection and matching process, and at the same time realize parallelization on the hardware side to improve the efficiency and improve the application of the algorithm.