Research Methods for Classification and Identification of Ancient Glass Types

: Ancient glass is susceptible to the influence of the environment of the burial site and then produce weathering, weathering will lead to changes in the proportion of its color and chemical composition, this paper analyzes the data of high-potassium glass and lead-barium glass, research on the weathering law of the glass artifacts, and classify and identify the type of glass. In order to classify the types of glass, this paper determines the best ccp_alpha of CART algorithm is located at [0,0.39296057] by cost pruning method, reduces the impurity of the classified tree to 0, and finds that the main difference between the classification of high-potassium glass and lead-barium glass lies in the content of PbO. The chemical compositions of different glasses are subclassified by K-means, and the number of nests of subclassified high-potassium glass and lead-barium glass is determined to be 4 and3 respectively with the help of SSE coefficients and profile coefficients, and the detailed subclassification is realized by CART algorithm. On the basis of the above, the prediction accuracy of Al-A8 glass types was accomplished by the perceptual machine model with 100% accuracy, and the results showed that the model stability and accuracy were high.


Introduction
Glass is a precious artifact from the Silk Road trade exchange [1].In the process of making glass, different co-solvents need to be added in order to lower the melting point, and at the same time, different glasses are obtained [2].The ancient glass is susceptible to the effects of the environment in which it was buried.Ancient glass is susceptible to weathering by the buried environment, and different weathering degree will have different weathering characteristics, which makes the analysis and identification of ancient glass products difficult.The analysis and identification of glass is primarily conducted through the utilization of chi-square testing and Q-clustering, as demonstrated by Huang Huiting, Li Chunming, Liu Siyu et al [3].and Xu Hai, S. Hu, X et al., glass classification based on K-means clustering method [4], Zidong Z classifies glass by polynomial fitting mathematical expressions to determine the chemical composition of different glass.Ref [5].Cao Jianyong, Xu Ting, Liu Yi et al used support vector machine algorithm to classify and predict glass types [6].Jiang Shaoxuan, Chu Zhaoling, Li Jiaxiang et al found that PbO was the main difference between high potassium glass and lead barium glass through variability analysis [7].In this paper, with the help of 2022 mathematical modeling dataset and processing of missing values and outliers on the data, the difference division between high potassium glass and lead-barium glass is realized by CART algorithm, the subclassification of different glasses is realized by K-means algorithm, and the determination of the basis of the sub-non-classes is realized by the decision tree, and finally the prediction of A1-A8 glass types is realized by the perceptual machine algorithm.

CART algorithm
Figure 1: CART binary decision tree structure [8] The CART algorithm is the most widely used decision tree learning method [9].It is suitable for handling discrete data with missing values by minimizing the Cini efficient Gini(p) criterion for feature selection.The CART tree is a bifurcated tree structure consisting of a root node, an intermediate node and a terminal node as shown in Figure 1 above.The CART algorithm splits each node into new child nodes based on its maximum features and the impurity of the split is measured by the error term of the loss function as in (2) below.Measurement.

Gini p p p p
1 Where D denotes the total number of samples, Di denotes the number of samples on the ith node, and Li denotes the loss function on the ith node.
In order to ensure that the impurity of the model is minimized, it is also necessary to prunethe CART algorithm, which is done in this paper using the cost complexity pruning(ccp) method.
The cost complexity pruning method is a top-down decision tree pruning method with a computational complexity ( 2 ) Compared with the error rate reduction pruning method, the complexity of pessimistic pruning method as well as the minimum Error pruning method is () higher, but can get the optimal decision tree model.inorder to improve the accuracy of the model, so this paper adopts the cost pruning method.After the pruning process to determine the appropriate value interval of ccp alpha, the impurity of the node is reduced to the minimum.
The main idea of the cost pruning method is that if clipping a node t in the decision tree reduces the complexity and impurity of the decision tree, the node will be clipped, otherwise the node is retained, the main setup of a metric α to realize the pruning method, and the node will be removed if the value α after clipping the decision tree is less than the value α when the node is retained.The formula for calculating the value is as follows: Where () denotes the learning rate of the decision tree, (  ) denotes the decision tree child, and () denotes the number of leaf nodes of the decision tree T.
The costly complexity pruning method is mainly done through the steps in the costly pruning method Table 1

Modeling
In this paper, CART decision tree building is mainly divided into decision tree generation and pruning.
Step 1: Decision Tree Generation The CART decision trees are classified based on the Gini coefficient, which is denoted by the Gini coefficient of CART given the dataset D and feature A: Where D1 and D2 denote the 2 datasets after the classification of feature A. Gini (D|A) denotes the uncertainty after segmentation, and the smaller the Gini(D|A) the higher the model accuracy.
The Gini(D|A) can be calculated for each feature as in Table 2: is the smallest, so PbO is chosen as the optimal cut-off point.Assign the dataset inside the two sub-nodes according to the features in turn.
Step 2: Pruning of decision trees In order to reduce the complexity and impurity of the decision tree, the decision tree needs to be processed by the pruning algorithm, and the output CCP path through model fitting is.
ccp path When0 ≤  ≤ 0.39296057, the impurity of the decision tree is 0.39296057.Setting the ccp_alpha parameter of the decision tree in the interval [0, 0.39296057] reduces the impurity of the decision tree in the computation interval from 0.39296 at the default ccp_alpha to zero.

Model testing and results
The binary tree structure of this CART decision tree is shown in Figure 2:

SSE coefficients and contour coefficients
The sum of squared errors (SSE) within a group is an important indicator in the clustering algorithm to determine whether the model is optimal or not, the smaller the SSE is, the better the model is under the same k-value clustering model.The formula for SSE is as follows: The silhouette coefficient is an indicator of how good the clustering is, and it consists of the degree of cohesion  by the degree of internal aggregation and the degree of separation.The formula is as follows: The optimum number of nests for classification can be determined by the relationship between the SSE coefficients and the profile coefficients, where the "elbow" of the SSE coefficients and the profile coefficients is the true number of nests.Figure 4

K-means based subclassification
The main core of the K-means algorithm is the selection of the optimal division method D* by minimizing the loss function (9) [10], which uses Euclidean distance as the distance between samples In the above equation ( 9) l x denotes the mean or center of the first l class.
The K-means algorithm is an iterative process that first selects the centers of the k nests, assigns the samples to the nearest nests one by one, and then updates the expectation of each class as the new nest center, and repeats the above steps so as to solve the optimization problem to obtain the optimal division method D*:

Determination of subcategorization components based on CART decision tree
The classification of high potassium glass and lead-barium glass was determined by the k-means algorithm, but the specific differences in chemical composition between the subclassifications were not determined, and the subclassification chemical composition differences were determined by the CART decision tree.The steps of realization are shown in 2.2.The decision tree structure for high potassium glass and lead-barium glass is in Figure 6 , (a) and (b).The results of the classification are shown in Table 3 and Table 4.Among them: The input feature space is divided into 2 parts i.e. division plane by w when  ∈ 1, outputs 1, i.e., high potassium glass, and when  ∉ 0 when, output -1 i.e. lead barium glass.
In order to ensure that the loss function ( 14) is the smallest and the classification effect is the best delineation, this paper adopts the stochastic gradient descent method, through continuous iteration until there is no error classification point, and then optimize the location of the delineation plane, the specific method is as in (15): ( , ) ( ) Step 1: Input feature space, due to the small dataset of unknown glass, this paper adopts the dataset of 2.2 as the training data of the perceptron.
Step 2: Select the initial, bring in ( 13)-( 15), start the iterative process until the loss is minimized and there is no error point, in this paper, through 56 iterations, the loss function of the model is reduced to 0.00339607 loss = close to 0, the model works well.
Step 3: Output the classification result as Table 5:

Conclusion
In this paper, through the in-depth study of glass data, CART classification model, the impurity of classification is reduced to 0 by cost complexity pruning method, and PbO is determined as the main difference between high potassium glass and lead-barium glass.The K-means-CART classification model was also established, i.e., the class of each glass was determined by the K-means algorithm, and then the difference between the subclasses was determined by the decision tree algorithm, which categorized the high potassium glass into four classes: high calcium-high silica glass, high calciumlow silica glass, high calcium-medium silica glass, high calcium-medium silica glass, high calciummedium silica glass, high calcium-low silica glass, and high potassium glass.The high potassium glass is divided into four categories: High calcium-high silica glass, High calcium-low silica glass, High calcium-medium silica glass, Low calcium glass; the lead-barium glass is divided into three categories: Low silicon low phosphorus glass, The lead-barium glass is classified into three categories: Low silicon low phosphorus glass, High silicon low phosphorus glass, High phosphorus glass, and finally, through the perceptual machine model, it is predicted that A1, A6, and A8 are high potassium glass, and A2-A5, and A7 are lead-barium glass, with an accuracy rate of 100%.This paper predicts and divides the types of ancient glass by their chemical composition content.However, for the glass artifacts unearthed in the future, it is difficult to know their specific chemical composition, and out of the principle of protection of cultural relics, it is also difficult to directly measure the chemical composition content of cultural relics.The problem can be well solved by establishing a known detection model through the classification algorithm, i.e., extracting the color, chemical composition and other characteristic data through the information of the known cultural relics of glass, establishing a database of glass relics, training the relics model, and providing a

Figure 2 :Figure 3 :
Figure 2: CART binary decision tree structure The accuracy of the model and the confusion matrix for classifying the glass into high potassium glass ( ≤ 5.46) and lead-barium glass ( > 5.46) by PbO content are shown below in Figure 3(a) and (b).As shown above the model is 100% accurate and the model is highly accurate.

Figure 4 :
Figure 4: Count the result clustering diagrams of high potassium glass and lead barium glass obtained by solving the above model are shown in (a) and (b) of Figure 5 as follows.
(a) Cluster plot of high-potassium glass (b) Cluster plot of Lead-barium glass