Body Pose Estimation Based on Half-body Mixed Model

In order to improve the effect and speed of human pose estimation from the static image, this paper proposes a method based on the prior knowledge of HOG eigenvalue and face detection to establish the human body bust mixed model for human pose estimation. First, assume that the bust human model contains K components, the static image is divided into M * N cells, each cell may be one of the components, according to the fractional calculation formula to calculate the root component scores, and ultimately determine the human body. The bodily mixed model can be used to calculate the position and direction of human limb accurately.

and other complex environments, this paper chooses HOG feature as the feature of describing human body target. After extracting the human body feature information, LSVM [9,10,11] is used to classify it. When the deformable part model is established, different weights of different parts are set according to the different contribution of the different regions of human body to the detection effect. The components with larger response score are more important to the testing process. Based on previous research, this design enriches the marker information in the training samples and improves the detection model to improve the detection performance.
Detecting the human body from the image needs to match the component model with the image. The simple method is to use image color information to image segmentation and then to match with the model. But this method is ineffective, because the human dress and lighting, background changes, resulting in more difficult to get the image segmentation threshold. Therefore, this paper chooses the HOG feature which obtains the static image according to the gradient direction so as to better solve the problem.
To compute the HOG feature, first calculate the gradient as follows: H (x, y) represents the gray value of the image at pixel (x, y), Where ) , ( x y x G and ) , ( y y x G represent the horizontal and vertical gradients at (x, y), respectively. The gradient size and gradient direction at the pixel (x, y) in the image are:

Half -body hybrid part model
Simple use of rectangular or oval parts that represent the body parts, often require location, orientation, scale and other information in order to accurately represent the status of a component, correct estimation of the human body posture is faced with many problems such as many parameters and large computation. The use of the Pictorial model lacks prior knowledge of the appearance of the human body, and therefore requires the use of hybrid models to combine the two to exploit their advantages.
Component Representation. The hybrid model still uses components to make up the human body, but instead of using rectangles or ellipses to represent the human part more accurately, the alternative is to use only square components. The rectangular or elliptical representation indicates the direction in which the component is estimated when the component is estimated, and the square representation uses only four possible component states to replace the orientation, as shown in Figure 1. At the same time the component state does not need scale information, so only a few parameters are needed to describe the human attitude. In this paper, a hybrid model of human body is proposed for target detection. The method divides the human body into several interrelated parts, uses the graph model to model the human body, and utilizes the graph reasoning method to optimize the human posture. Experimental results show that the model in the human body detection can also achieve good results, the following focuses on the human body blending model.

Experimental procedure
The human body half-body hybrid model consists of two layers of filters: a root filter and part filters. The root filter is used to capture the overall contour feature of the object, while the component filter is used to capture the details of the features in the target, such as eyes, nose, and mouth. The filter is actually a rectangular template, and each element of the matrix is a d-dimensional weight vector.
A model for detecting a bust of a human body having n component targets can be described as a (n is a two-dimensional vector that records the relative position of i P and the root part, and i d is a four-dimensional vector representing the deformation cost coefficient of each possible position of i P . The HOG feature extracted in the image can be represented as a feature map G. A feature bitmap is a two-dimensional matrix in which each element is a d-dimensional vector that corresponds to a cell in the image. The upper left corner of the filter F is placed at the coordinate (x, y) of the feature bitmap G, and the two rectangles are calculated as "dot product". The score is defined as: The higher the score, the greater the likelihood that a cell becomes the component represented by the filter.
When making an assumption about an object, we need to specify the position of each filter in the feature pyramid in the model, indicates the location and hierarchy of the i-th filter. A hypothetical score is composed of three parts: 1. the fraction of all the component filters in their respective positions; 2. the distortion of the position of each component relative to the root; and 3. the prior probability. Using the mathematical formula: gives the offset position of the i-th component and ) , , is the deformation feature. In order to detect an object in the image, a global score is calculated from the position of the component, that is: The larger the ) ( scope 0 p , the higher the probability of an object in the image. In the training process of the deformable part model, a hidden support vector machine (LSVM) designed for weak supervised learning is introduced. Considering a classifier, the score for each sample in LSVM can be expressed in the following form: Where β is a vector of model parameters, z is invisible variables, set Z (x) defines the possible hidden variable value sample x, that is the position of each member. Similar to the classical SVM algorithm, we need to train β using training set ) , , , where -1 <y <1, training The procedure is to minimize the objective function: is the standard connection loss and C controls the weight of the regularization term. If there is only one possible hidden variable per sample i x , 1 ) Z(x i  , then  f is linearly related to  . Linear SVM can be considered as a special case of hidden SVM.
In order to train the object detection model, a large number of negative samples are often required. But it is unrealistic to use all negative samples at the same time. Generally consider training data include positive samples and "negative" negative samples. Bootstrapping can be used to train an initial model based on the initial subset of negative samples and subsequently collect negative samples that can not be correctly classified in the initial model to form a "difficult" negative set of samples. A new model is then trained using the negative negative samples. This process is repeated several times until the model is trained.
After reasoning, distance translation, and message passing, we need to go back, and if the body gesture reasoning algorithm determines that a cell is a human model root, it needs to look back to find other parts of the body. Depending on the number of rows, columns, and types of cells in which the root part resides, you can know the cell location and part type in which the message is passed in the child, which is the child of the root part. Subsequent searches for the number of rows, columns, and types in the subassembly reveal that the next subcomponent can be repeated by repeating the process to find the final model leaf node.
All of the above components together is a complete human body, according to parts of the body part of the information, location and direction and scale.   In this paper, the hybrid model is used to estimate the attitude of the driver. First of all, the driver's body is modeled, and the body parts are represented by a simple Part model. Spring links are used between the components to determine the human body based on the root component scores. The detailed steps of estimating the human pose principle by using the hybrid model, calculating HOG features, distance conversion, message passing, backtracking and non-maximum suppression are introduced in detail. Experiments show that the hybrid model can be used to calculate the position and direction of the driver 's upper limb accurately.