ReCurricularFace: Revisiting CurricularFace for Hard Sample Mining

: Mining of hard samples has always been a challenge in the field of face recognition. Mining-based methods have achieved promising results on the challenge of hard samples. However, current methods all suffer from the problem of not thinking, about when the hard samples should be close to the target class center and when they should be close to the non-target class center. Therefore, this work is based on CurricularFace by analyzing the logit and gradient, to carry out the boundary of judging the hard samples to be close to the center of the target class and non-target class to be close to the center, and based on the boundary to revisit the CurricularFace, to obtain a revised CurricularFace (ReCurricularFace), which is named as ReCurricularFace. We find through comparison experiments that ReCurricularFace obtains a huge improvement in the face benchmark.


Introduction
Face recognition is an important direction of biometric identification, in the financial, hotel, retail, security, and other fields, face recognition has a huge application value.Although face recognition [1] has great application value, in complex application scenarios, especially in uncontrolled scenarios, a large number of hard samples that are difficult to recognize will be generated, which poses a great challenge to face recognition algorithms.Therefore, step by step, reducing the accuracy of hard samples being recognized incorrectly, thus improving the accuracy of face recognition algorithms, will further expand the application of face recognition in various fields of application scenarios.Some of the classic works based on hard sample mining, such as MV-Softmax [2] and CurricularFace [3], have further improved the accuracy of face recognition.MV-Softmax [2] addresses the challenge of effectively learning hard samples by adding different penalties between the target class center and the non-target class center.However, this approach can lead to training instability.In the subsequent work, CurricularFace introduces curriculum learning to address the instability issue of MV-Softmax.Although previous works like MV-Softmax [2] and CurricularFace [3] have further enhanced the performance of face recognition, CurricularFace, during the learning process, narrows down the decision boundary range by reducing the size of the target class center.This limitation prevents the network from reaching its optimal state.
In the CurricularFace framework, the CurricularFace loss [3] is defined as follows: where (cos(   )) = cos( + ) and (, cos (  )) = cos () * (, cos ()) are two functions representing the positive and negative cosine similarities, respectively.(, cos ()) denotes the coefficient for negative cosine similarity.Considering the distance between the target class center and the non-target class center as Consequently, the relationship between (, cos (  )) and (cos (   )) is illustrated in Figure 1.The intersection points of  and  represent the distance from the sample to the target center where both classifiers yield equal probabilities, corresponding to the actual decision boundary of the target class center.The yellow line indicates the ideal distance from the non-target class center to the decision boundary of the target class center.When  is above, the sample is classified as belonging to the non-target class center, and when  is above, it is classified as belonging to the target class center.Through Figure 1, it can be observed that CurricularFace reduces the actual decision boundary of the target class center, leading to the neural network failing to learn the optimal state.To address the issue mentioned above, we have modified CurricularFace to ensure correct classification.Firstly, considering the decision boundary perspective, when the sample is too close to the non-target center, we allow to take any value in the domain, and .Next, we employ the Curriculum Learning (CL) method.Initially, we provide a larger decision boundary for the target class center, enabling the network to learn between the class center and the samples.Additionally, during the convergence of the network, we gradually shrink the decision boundary of the target class center towards the ideal state.

Related Work
The design of the loss function is very important, and the general loss function has good generalization for different tasks.Research on loss functions for face recognition can be classified into loss functions based on margin and loss functions based on mining.
Loss function based on margin: The loss function based on margin evolved from classification loss and has become the mainstream direction of face loss research [1] in the field of face recognition.With the step-by-step exploration of face loss based on margin, the accuracy of face recognition has been gradually improved, which also verifies the importance of the idea based on margin.Previous face recognition mainly explored the design of the margin from the perspective of the target class center, while our work explored the design of the margin and the correctness of the margin design from the perspective of the non-target class center.
Loss function based on mining: Compared to the loss design based on margin, the face loss based on mining is relatively less.Among them, the more classic works are MV-Softmax [2] and CurricularFace [3].MV-Softmax is the first to integrate the two ideas of margin and mining.CurricularFace brings the idea of curricular learning into face recognition.We redesign CurricularFace from the perspective of decision boundaries and the non-target class center.

Method
Although the relationship between samples, the target class center, the and non-target class center is defined in the same plane in the introduction section, in actual training on networks, a more complex situation is encountered.Based on previous work, we can infer that the class centers are assumed to be distributed on a hypersphere.Under this assumption, the distance between adjacent class centers is defined as follows: Where d represents the dimensionality, n represents the number of class centers, and θ(W j ) = min 1≤i,j≤n,i≠j arccos (W i , W j ) for each i, j.Additionally, the distance from the target class to the decision boundary is . Therefore, when cos (θ j ) > cos ( ) , samples will be classified within the decision boundary of the non-target class, and we should ensure that I(t, cos(θ j )) > 1 to rapidly increase the distance between the sample and the non-target center.Thus, we scale cos (θ j ) by a coefficient.Therefore, we can obtain: Though the above I(t, cos (θ j )) can rapidly increase the distance between the sample and the non-target center, under this condition, according to N(t, cos (θ j )) = T(cos (θ y i )) , the actual decision boundary generated by class centers will expand, making it easier for more hard samples to be attended to by the target class centers.However, this would cause the network to lose some discriminative power.To shrink the decision boundary to an ideal state, we introduce Curricular Learning, gradually restoring the decision boundary of class centers to the ideal state.For this purpose, we introduce a αt bias term to shift the decision boundary to the ideal state.Therefore, we can obtain an equation: , we can obtain the value of α, represented as follows: Therefore, summarizing the above, we define I(t, cos (θ j )) as follows:

Implement details
According to ArcFace [1], it can be inferred that [θ(W j )] ≈ 80 .Therefore, I(t, cos(θ j )) = 1.1cos(θ j ) + 0.4t.Other relevant training details are as follows: The backbone network is trained using ResNet18.The batch size is set to 512, training for 34 epochs with a learning rate of 0.1, and the learning rate is reduced by a factor of 0.1 at epochs 20, 28, and 32.

Evaluation Protocol
In the testing set, we partition the dataset into high-quality, mixed-quality, and low-quality datasets.For the high-quality and mixed-quality datasets, we employ a face verification protocol (Accuracy) to assess the model's performance.For the low-quality dataset, we utilize a face recognition protocol (Rank-n) for evaluation.From Table .2, it can be observed that on the mixed dataset, ReCurricularFace exhibits a widening performance gap relative to ArcFace and CurricularFace as the constraint conditions increase.Therefore, this suggests that ReCurricularFace enables the network to better learn hard samples.Figure 2 illustrates the same results.

Results on TinyFace
From Table .3, it is evident that both CurricularFace and ReCurricularFace show improvements relative to ArcFace.Particularly, CurricularFace and ReCurricularFace exhibit similar performance on the low-quality dataset.

Conclusion
In this study, we introduced modifications to CurricularFace from the perspective of non-class centers and decision boundaries, resulting in ReCurricularFace.Experimental results demonstrate significant improvements in ReCurricularFace over CurricularFace on both high-quality and mixedquality datasets.Therefore, we can conclude that when designing face loss functions, it is essential to ensure that the actual decision boundaries of target and non-target class centers align with the ideal decision boundaries.This conclusion can provide further guidance for the design of loss functions in subsequent research.Additionally, for future work, we consider incorporating quality attributes into the loss function to further guide the network's learning process.

𝜋 2 , 2 −
with an added margin function of  = 0.5, the samples lie on a plane formed by the target and the non-target class center.We can obtain    =    .

Table 1 :
Results on LFW, AgeDB-30, and CFP-FP.Accuracy is used to evaluate the performance of face recognition, with the best metric indicated in bold red font.

Table . 1
, it can be observed that for the same environment, ReCurricularFace demonstrates a significant improvement over ArcFace and CurricularFace on the high-quality dataset.This indicates the effectiveness of ReCurricularFace.

Table 2 :
Results on IJB-B and IJB-C.Tar@Far is used to evaluate the performance of face recognition, with the best metric indicated in bold red font.

Table 3 :
Results on TinyFace.Rank-n is used to evaluate the performance of face recognition, with the best metric indicated in bold red font.