A Correlational Neural Network for Gender Classification

A convolutional neural network (CNN) can perform well in a variety of applications such as human face gender classification, but requiring flips of convolutional kernels in implementation. By replacing convolution with correlation, we propose a correlational neural network (CorNN) instead of a CNN. A CorNN takes advantage over a CNN in that it requires no flips of correlational kernels in implementation, saving a lot of training and testing time. Experimental results show that an 8-layer CorNN for gender classification can not only perform as well as the corresponding CNN, but also run surprisingly faster with a relative reduction of 11.29%~18.83% training time, and 10.16%~16.57% testing time.


Introduction
Human face gender classification is a subject of automatically identifying the gender (male or female) of a human face image. It is easy for a human but challenging for a computer [1], with wide applications in customer advertisement, visual surveillance, intelligent interface, population statistics, and so on.
As a traditional method, fully-connected neural networks were applied to identification of human face gender many years ago [2] [3][4] [5]. However, the fully-connected neural networks did not take into account the two-dimensional structure of an image when extracting features from it. As a consequence, they are not satisfactory in terms of gender classification accuracy. Their drawbacks ignoring 2D structure information could be overcome partly by convolutional neural networks (CNNs). Recently, Verma et al. have implemented a 6-layer CNN to identify face gender. However, this CNN needs plenty of epochs which is 25000 to achieve 88.46% accuracy on a dataset containing 4700 face images from the web [6].
Primarily, a CNN is designed to recognize 2D shapes by inspiration of the visual neural mechanism with some degree of translation invariance. In 1962, Hubel and Wiesel proposed the concept of receptive field in studying the cat's visual cortex cells [7]. With the insight of receptive field, Fukushima developed the model of neocognitron [8], perhaps the first implemented CNN.
Commonly, a CNN is composed of input layer, convolutional layer, subsampling layer, convolutional layer, subsampling layer, …, fully-connected layers and output layer. It has two distinguishing features: local connectivity and shared weights. The local connectivity allows the network to first create good representations of small parts of the input, then assemble representations of larger areas from them. The shared weights allow for features to be detected regardless of their position in the visual field, thus constituting the property of translation invariance.
Due to the superior ability of feature extraction, CNNs have been successfully applied to many important fields, such as character recognition [9], face recognition [10], face tracking [11], and traffic sign recognition [12]. Especially in ImageNet Large Scale Visual Recognition Challenge (ILSVRC), CNNs play a vital role in achieving the best performance [13].
However, a CNN has a disadvantage in implementation of convolutional operation, which may require a huge number of horizontal flips and vertical flips of convolutional kernels. This can obviously slow down the training speed of a CNN on big datasets. In order to eliminate the flipping effect, we present a new model, called correlational neural network (CorNN). Compared to a CNN, a CorNN has no requirement of flipping convolutional kernels, thus running faster than a CNN. Furthermore, we implement an 8-layer CorNN for gender classification in this paper.
The rest of the paper is organized as follows. Section 2 gives the model of CorNN in detail. Section 3 makes a comparative evaluation for gender classification of CorNN and CNN in experiments. Finally, Section 4 makes a few conclusions.

Equivalence between CNNs and CorNNs
A CNN has four basic operations: inner convolution, outer convolution, subsampling, and upsampling. These operations are defined as follows. Suppose And their outer convolution is defined by: that extends A with 0, namely, If matrix A is divided into non-overlapping blocks of size    with the ij-th block denoted by , ( , ) The non-overlapping subsampling of matrix A with block size    is defined as: .
The non-overlapping upsampling of matrix A with multiple size    is defined as: is a matrix of 1, and  denotes the Kronecker product. Correspondingly, a CorNN also has four basic operations: inner correlation, outer correlation, subsampling, and upsampling.
The definitions of subsampling and upsampling in the CorNN are the same as in the CNN. The inner correlation = between matrix A and matrix B is defined as: The outer correlation between A and B is defined as: where B A  is the same as in formula (2).
Generally, the inner and outer convolutions of two given matrices are different from their inner and outer correlations. However, in theory an arbitrary CNN can be simulated by a corresponding CorNN, and vice versa.
In fact, for any convolutional kernel w of size mn, we can define a horizontal flip of w below: Moreover, we can also define a vertical flip of ' w below: where ' 1, , 1 ,1 Hence, for an input (or a convolutional map) x , we have This means that if we replace all convolutional kernels w in a CNN with their corresponding correlational kernels, namely, we can get a CorNN that always produce the same output as the CNN for the same input.
In the meantime, because , we can also get a corresponding CNN that always produce the same output as any given CorNN for the same input.
Therefore, CNN is equivalent to CorNN in functionality. Because a convolutional operation may require a large number of vertical and horizontal flips in implementation, it can be expected that a CorNN should be more efficient than the corresponding CNN in training and testing. Therefore, we propose a CorNN instead of a CNN for gender classification in the next subsection.

A CorNN for Gender Classification
As mentioned previously, a 6-layer CNN has been designed to identify face gender [6], but requiring more than twenty thousand epochs of training to achieve a satisfactory result. Considering deeper neural networks may perform better [14], in Fig.1 we present an 8-layer CorNN for gender classification. The CorNN consists of an input layer x , three correlational layers ( 1 3 5 , , h h h ) of sigmoid neurons, three subsampling layers ( 2 4 6 , , h h h ), and an output layer o. The input is a face image for correlational layers to extract gender features with subsampling layers for dimensionality reduction. The three correlational layers are ordered alternately with the three subsampling layers. The last subsampling layer is directly connected to the output layer. The output layer is composed of two sigmoid neurons for gender classification.

Learning Algorithm for the CorNN
For the l-th sample l x , we can describe the computing procedure of the CorNN as follows: Subsampling Subsampling Figure 1 The architecture of the proposed CorNN. where where ' '  denotes the element-wise product, Using the sensitivities, the CorNN-BP then computes the partial derivatives of the objective function N L with respect to weights and biases below,   7  6  7  7  7  1  1 4, 5, 5, 5, Finally, the CorNN-BP updates the weights and biases using gradient descent. The main advantage of the CorNN-BP lies in that it requires no flips of weight matrices in implementation, because of using inner correlations and outer correlations instead of inner convolutions and outer convolutions. This can save a lot of training and testing time with a bit surprise, as shown in experimental results.

Experimental Results
In order to demonstrate the validity of the proposed CorNN for face gender classification, we compare it with a CNN with the same structure on nine human face databases: ORL, Georgia Tech, FERET [15], Extended Yale B (EYB) [16], AR [17], Faces94, LFW [18], MORPH and CelebFaces+ [19]. The examples of face images from these databases are shown in Fig. 2. The proposed CorNN and the corresponding CNN are implemented in Matlab R2010a. Both of them take the learning rate of 0.1 on all the databases, with the maxepoch of 50 on the first six and 1000 on the last three. All experiments were conducted on a PC platform with i7-3770 3.10-GHz processor, 8-GB memory, and Windows 7.0 operation system.

Preprocessing of Images in Databases
The nine face databases for experiments are detailed in Table 1. "MR", "FR", and "TR" denote number of males training samples, number of female training samples and number of total training samples, respectively. "ME", "FE", and "TE" denote number of male testing samples, number of female testing samples and number of total testing samples, respectively. Moreover, no person has some of his/her images in training set and others in testing set simultaneously. Additionally, ORL, FERET, EYB and AR consist of gray images, whereas Georgia Tech, Faces94, LFW, MORPH and CelebFaces+ consist of color images. Before experiments, all these images were resized to 3232 with color images converted into gray ones, with the value of each pixel normalized into [0, 1].

Whole Comparison of Accuracies and Time
In Table 2, we give the testing accuracies of CorNN and CNN for whole comparison on the nine databases. It is shown that CorNN can perform sometimes as well as CNN (e.g., on ORL and AR), sometimes slightly better (e.g., on Georgia Tech, EYB, LFW, and CelebFaces+), and sometimes slightly worse (e.g., on FERET, Faces94, and MORPH). Hence, CorNN is as good as CNN in terms of accuracy for gender classification.  In Table 3, we give the training/testing time of CNN and CorNN for whole comparison on the nine databases. It is shown that the CorNN takes less than the CNN, with a relative reduction of 11.29%~18.83% training time, and 10.16%~16.57% testing time.

Separate Comparison of Accuracies
In Table 4, we give the testing accuracies of CorNN and CNN for male comparison and female comparison on three databases (i.e., EYB, Faces94 and LFW), with "M" and "F" standing for male and female. It is also shown that CorNN can have as good performance as CNN for male identification and female identification separately, albeit with a slight difference.

Conclusions
Using correlational operation instead of convolutional operation, we propose a CorNN that is equivalent to its corresponding CNN. Moreover, we design an 8-layer CorNN for human face gender classification, together with the learning algorithm CorNN-BP. Additionally, we compare the 8-layer CorNN with the CNN in experiments, verifying that they almost have the same performance in terms of accuracy. However, a CorNN can save a relatively large amount of training time and testing time, due to no requirement of flipping correlational kernels. This is the main advantage of CorNN over CNN.