An improved MIMLRBF natural scene image classification based on spectral clustering

Natural scene image classification problems can be showed by multi-instances multi-labels learning model (MIML), and MIMLRBF algorithm achieved good effect. MIMLRBF algorithm is based on the clustering technology and neural network for classification. Related experiments show that the measure of the package and the selection of the cluster center have an important impact on the result of image classifications, in order to obtain better clustering accuracy, first of all, this article introduced the spectral clustering method in the training process, which can make the sample package center more reasonable; Second, we redefined the distance between the sample packages, to overcome effectively the influence of the isolated examples on the distance to the sample packages. The experimental results show that the proposed approach can effectively improve the classification accuracy, and it is better than MIMLRBF algorithm on the various performance.


Introduction
Early method to solve the problem of MIML(Multi -Instance Multi -Label) is to decompose it into the problem of multi-instances or multi-labels.This strategy of degradation ignored the correlations between instances and labels and had the disadvantages of low classification accuracy and long running time.Neural network is a kind of practical techniques in machine learning, and it plays a role in the problem of MIML.Zhang M-L [7] proposed MIMLRBF(Multi-Instance Multi-Label Radial Basis Function) based on neural network, the algorithm made full use of the relationships between instances and labels and get better effects.The key of the MIMLRBF algorithm is to obtain clustering center.Another key of the MIMLRBF algorithm is the measure way of the distances between two packages, and the performance of algorithm is improved when improving the distances between two packages in the literature [6] and literature [7].
In this paper, in order to obtain better clustering accuracy, we do the following improvement based on MIMLRBF algorithm framework.The one improvement is using the method of spectral clustering instead of k-medoids in the training process to obtain clustering center.There are not all instances being effective expression of target characteristics in all the instances.The clustering algorithm of k-medoids can't effectively eliminate the effects of these ineffective instances and the clustering center of image packages is not accurate, while the clustering algorithm of spectral clustering can effectively tap the similarity of samples, which can effectively improve the accuracy of clustering center.The other improvement is improving the measure way of the distances between two packages by redefining the distances between two packages.

Improved measure way of distance of two packages
Hausdorff distance is a measure way of describing the features of two groups of point set, and it has been better applied in some algorithm.Hausdorff distance can be divided into three kinds: the maximum Hausdorff [6], the minimum Hausdorff [6] and the average Hausdorff [7].Experiments show that the average Hausdorff can get the best performance in the problem of MIML.The average Hausdorff is to solve the average distance of the minimum distance between the every samples of a package and all samples of another package.When solving the average value, sometimes the instances of individual far away may increase the distance between two packages and reduce the contribution of the some minimum distances between two packages, sometimes the instances of individual close distance may also affect the real distance between two packages.On this issue, the improved algorithm has been made the further revision to the average Hausdorff through the linear combination between the maximum Hausdorff and the minimum Hausdorff, and proposes the weighted distance formula.It's shown as follows: Where , minH X X respectively expresses the average、 maximum and minimum Hausdorff between the two point sets 1 X and 2 X .

Clustering center based on spectral clustering
This paper adopts the k-way of Ncut algorithm.Set as the training sample sets, i X shows the image packages expressed as 9 x 15 dimensional feature vector, i Y shows the pre-determined sets of labels, N shows the numbers of the samples, and the numbers of label set as m .Firstly, solve the distances between every two image packages of the N training sample sets according to the formula (1) and construct the distance matrix of N x N dimensions about the training sample sets.Then use the standard Laplacian matrix as follows Where, D is degree matrix, S is similar matrix.Reduce the dimension of the distance matrix of the samples, and construct the new distance matrix of the samples with higher similarity.In the end, cluster through the clustering algorithm of k-medoids and solve the clustering center of each cluster.
Where, i X is the image packages of input samples, i c and i  are respectively the center parameter and width parameters of the first i hidden layer nodes.
, , , , , , In the training process, according to the distance between the input sample and the centers retained in the process of spectral clustering, use the method of the gradient descent to train weights.In the test process, input samples get the output value according to the distance with centers retained in the process of spectral clustering and the function of the weigh.

Experimental design
The data sets consist of 2,000 natural scene images belonging to the classes: desert, mountains, sea, sunset and trees, and each image has already been split to nine subdomains by SBN method and every subdomain stands for a 15 dimensional feature vector.There are 22.85% of samples belonging to over one class.
The basic procedure of the MIMLRBF algorithm is as follows: 1) Input the sample sets shown as , , , , , , , , m Y y y y   ; Use the method of ten-fold across validation to divide these samples into 10 portions, and select nine of them as the training sets T and the following one as the test sets R.

2) As for each label
as the package sets with the label of l ; Divide l U into ( ) unrelated groups through the spectral clustering algorithm and the improved distance formula (1); Solve and retain the centers of each cluster.3) Train the network model according to the distances between input samples and the clustering centers retained in the process of spectral clustering.4) As for each test sample, solve the output value of this test sample on each label according to formula (3).

Experimental results
In this experiment, two improvements are mainly put forward based on the original MIMLRBF algorithm.The one is to improve the measure way of the distance between two sample packages, and the other is using the method of spectral clustering instead of k-medoids in the training process.Here are the experiments aimed at the two improvements respectively.

The effect of the improved distance formula on the algorithm
In this experiment, we only use the distance formula (1) to measure the distance between two instance packages and the other part is the same based on MIMLRBF.We called it as MIMLRBF-DISTANCE and the experimental result is shown as table 1.It can be seen from table 1, the algorithm performance is improved.

The effect of the spectral clustering on the algorithm
In this experiment, we use the method of spectral clustering instead of k-medoids in the training process to cluster samples and the other part is the same based on MIMLRBF.We called it as MIMLRBF-SC and the experimental result is shown as table 1.It can be seen from table 1, the algorithm performance is improved.Note: "↑" shows that bigger is better, "↓" shows smaller is better.

Experimental analysis
Table 1 shows the experimental results of these algorithms on the same sample sets, and the data with better performance display in bold type.As accurate measurement of the distance between two packages plays a key role in the results of classification in the problem of MIML, so the performance of MIMLRBF-DISTANCE is better than the performance of MIMLRBF.As the spectral clustering algorithm is based on the characteristics of the correlation and it can dig the similarity between samples, which can effectively improve the accuracy of the clustering center, so the performance of MIMLRBF-SC is better than the performance of MIMLRBF.The imMIMLRBF algorithm is combing the above improvements, and the performance is the best.

Conclusions
In this paper, we study and learn the problem of MIML based on RBF neural network, and introduce spectral clustering algorithm into the problem of MIML to improve the RBF neural network.There are many instances of image package without effective information of the target characteristics.Although k-medoids algorithm can eliminate the impact of these instances without effective information to a certain extent, the effect is not very good.However, the spectral clustering algorithm is based on the characteristics of the correlation to cluster, it can get more accurate clustering center when clustering samples.Experiments in this paper have also proved that the improved algorithm with the introduction of spectral clustering can get better classification effect.

Fig 1
Fig 1 is the frame of the algorithm in this paper, the whole process is divided into training process and test process.The training process has two steps.The first step is to train the clustering center according to the distance matrix constructed by the distance between every two SBN (Single Blob with Neighbors) package, then train the clustering center of every group with the method of spectral clustering.The second step is to train the neural network according to the distance between sample and clustering center calculated in the first step.In test process, output value results are obtained according to the distance between sample and clustering center retained in training process and network model trained in training process.

2. 4
Train and test based on RBF neural network MIMLRBF neural network trains samples with two layers of network structure as shown in Fig 2.This paper uses mathematical model of the RBF network for single hidden layer of a gaussian kernel function.

Table 1 .
Performance comparisons of the improved algorithms