Component Analysis and Identification Model of Ancient Glass Products Based on Correlation Analysis

: To help ancient glass products to analyze and identify their components, this paper establishes a comprehensive evaluation model to help identify and analyze ancient glass products and their components, and classifies them according to the data, so as to clarify the correlation and sensitivity between their chemical elements. First, this paper makes a simple classification of the data, and then calculates several factors that account for a large proportion of the weight through principal component analysis (PCA). By reducing the dimension of data, the variables are reduced, making the classification basis more intuitive; At the same time, the factors that account for a large proportion of the main factors can be used as the intuitive basis for the division of subcategories. Finally, K-Means is used to further confirm the rationality and sensitivity of the relationship between specific factors and cultural relics.


Introduction
There are a number of relevant data on ancient glass products. Archaeologists have divided these cultural relics into two types: high potassium glass and lead barium glass according to their chemical composition and other detection methods. Ancient glass is easily weathered by the influence of burial environment [1]. In the process of weathering, a large number of internal elements will exchange with environmental elements, resulting in changes in the composition proportion of the glass products found, thus affecting the correct judgment of archaeologists on their categories [2].
Therefore, this paper needs to find out the classification rules of high potassium glass and lead barium glass according to the data. In the data, the table of detected components of most cultural relics is given. In this paper, a large number of irregular data are initially processed [3]. Then the principal component analysis method is used to reduce the dimension, so that it is easier to find the corresponding laws. Then, this paper uses the factors with high principal component weight as the basis for classification of subcategories. Finally, K-Means is used to analyze the rationality and sensitivity of the relationship between specific factors and cultural relics [4].

Model establishment and solution
2.1 Analysis of classification rules of high potassium and lead barium glasses based on PCA 2.1.1 Data preprocessing (1) Abnormal data elimination In this paper, the data between 85%~105% and the cumulative proportion of components are regarded as valid data, and the data that do not meet this range are excluded.
(2) Missing value data processing In this paper, the high potassium is coded as 1, the lead barium is coded as 2, and the missing value is filled as 0 [5].

KMO and Bartlett's inspection
KMO test: 0.8 is very suitable for principal component analysis, 0.7-0.8 is generally suitable, and less than 0.6 is not suitable.
Bartlett test: if P is less than 0.05 and the original hypothesis is rejected, it means that principal component analysis can be done. If the original hypothesis is not rejected, it means that these variables may provide some information independently and are not suitable for principal component analysis. 0.000*** Note: * * *, * * and * represent the significance level of 1%, 5% and 10% respectively As shown in Table 1, the results of the KMO test show that the value of KMO is 0.351. At the same time, the results of Bartlett's spherical test show that the significance P value is 0.000 * * *, which is significant at the level. The original hypothesis is rejected. There is a correlation between variables. The principal component analysis is effective, and the degree is not appropriate. 0.000*** Note: * * *, * * and * represent the significance level of 1%, 5% and 10% respectively As shown in Table 2, the results of the KMO test show that the value of KMO is 0.346. At the same time, the results of Bartlett's spherical test show that the significance P value is 0.000 * * *, which is significant at the level. The original hypothesis is rejected. There is a correlation between variables. The principal component analysis is effective, and the degree is not appropriate.

Analysis variance interpretation table and gravel diagram
The variance interpretation table mainly looks at the contribution rate of principal components to variable interpretation. The explanation of total variance of high potassium glass and lead barium glass is shown in Table 3 and Table 4. The function of the gravel map is to confirm the number of principal components to be selected according to the gradient of the characteristic value. The combination of the two can be used to confirm or adjust the number of principal components. The crushed stone diagram of high potassium glass and lead barium glass are shown in Figure 1 and Figure 2. Through analysis, this paper considers that it is appropriate to take two principal variables for analysis.

Principal component load coefficient and thermodynamic diagram
The importance of hidden variables in each principal component can be analyzed by analyzing the load coefficient of the principal component and the thermodynamic diagram. 0.287 -0.195 0.120 Based on the above research, two principal components are determined in this paper. As shown in Table 5, the factor load coefficient of Al2O3 in principal component 1 is large, so principal component 1 can be defined as recessive aluminum. The factor load coefficient of P2O5 in principal component 2 is large, so principal component 2 can be defined as recessive phosphorus.  Table 6, the factor load coefficient of principal component 1 BaO is large, so principal component 1 can be defined as recessive barium. As shown in Figure 3 and Figure 4, the factor load coefficient of CaO in principal component 2 is large, so principal component 2 can be defined as recessive calcium.

Dimension reduction analysis of related variables
Based on the principal component load diagram, the spatial distribution of principal components is presented through quadrant diagram by reducing the dimensions of multiple principal components into double principal components or three principal components.
In conclusion, the classification of high potassium glass and lead barium glass mainly depends on calcium oxide (CaO), barium oxide (BaO) and aluminum oxide (Al2O3) as the main components.

Analysis of Cluster Category Differences
1) Cluster category difference analysis. Table 7 and Table 8 show the results of quantitative field difference analysis, including the results of mean ± standard deviation, F test results, and significant P value.
Analyze whether the P value of each analysis item is significant (P<0.05).
If it is significant, reject the original hypothesis, which indicates that there is a significant difference between the two groups of data. The difference can be analyzed in the way of mean ± standard deviation, otherwise, it indicates that the data does not show differences.

Analysis of variance
For SiO2, the significance P value is 0.000 * * *, showing significance at the level. The original hypothesis is rejected, indicating that there is a significant difference between the categories of SiO2 classified by cluster analysis; For Na2O, the significance P value is 0.039 * *, showing significance at the level. The original hypothesis is rejected, indicating that there is a significant difference between the categories classified by cluster analysis for Na2O; For K2O, the significance P value is 0.000 * * *, which is significant at the level. The original hypothesis is rejected, indicating that there is a significant difference between the categories of K2O classified by cluster analysis; For CaO, the significance P value is 0.001 * * *, showing significance at the level, and rejecting the original hypothesis, indicating that CaO has significant differences between the categories classified by cluster analysis; For MgO, the significance P value is 0.017 * *, showing significance at the level, rejecting the original hypothesis, indicating that there is a significant difference between the categories classified by cluster analysis; For Al2O3, the significance P value is 0.001 * * *, which is significant at the level. The original hypothesis is rejected, indicating that there is a significant difference between Al2O3 categories divided by cluster analysis; For Fe2O3, the significance P value is 0.048 * *, which is significant at the level. The original hypothesis is rejected, indicating that Fe2O3 has significant differences among the categories classified by cluster analysis; For CuO, the significance P value is 0.238, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference between the categories of CuO classified by cluster analysis; For PbO, the significance P value is 0.517, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference between the categories classified by PbO cluster analysis; For BaO, the significance P value is 0.370, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference in BaO among the categories classified by cluster analysis; For P2O5, the significance P value is 0.267, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference between the categories of P2O5 classified by cluster analysis; For SrO, the significance P value is 0.145, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference between the categories classified by cluster analysis; For SnO2, the significance P value is 0.020 * *, showing significance at the level, and rejecting the original hypothesis, indicating that there is a significant difference between the categories of SnO2 classified by cluster analysis; For SO2, the significance P value is 0.314, which is not significant at the level, and the original hypothesis cannot be rejected, indicating that there is no significant difference in SO2 among the categories classified by cluster analysis. 1) The model established in this paper can be closely linked with real life, and can solve the problems raised in combination with the actual situation. In addition, the model proposed in this paper is closer to reality, with strong universality and popularization.
2) The PCA method used in this paper can find out the correlation between different factors, and clarify whether the factors are positive, negative or irrelevant.
3) The model designed in this paper has strong operability, wide application scope, high reliability of factor weights, and can be widely used in other fields.

Model Disadvantages
1) The model proposed in this paper can only be used for qualitative analysis, not quantitative analysis.
2) The model proposed in this paper uses the Pearson correlation coefficient, which must ensure that both variables are continuous variables. In addition, the Pearson correlation coefficient is susceptible to outliers.

Model promotion
The model proposed in this paper can not only be applied to the classification analysis of ancient glass components, but also be applied to the analysis of a wide range of archaeological relics, such as bronze, which has extensive promotion value.