Research on composition analysis and type identification of ancient glass products based on data mining

: The ancient glass products are easy to be weathered, which makes a large number of internal and external elements exchange, leading to the change of their composition proportion, which further affects the composition analysis and identification. This paper establishes a mathematical model based on the weathering phenomenon of ancient glass products in China and the related chemical composition data, to predict, analyze and solve the classification laws and classification results of different glasses, the statistical laws and correlation of chemical composition content, and other problems.


Introduction
The main raw material of glass is quartz sand (silicate). During the refining process, the chemical composition of the final refined glass is different due to the different added cosolvents. At the same time, due to the influence of the storage environment, ancient glass is very easy to be weathered, and its chemical composition proportion will also change accordingly, affecting the correct judgment of glass type [1] . Therefore, determining mine the chemical composition system of glass under the influence of the environment is an important aspect of the study of ancient Chinese glass. In the context of the big data era, the article uses data mining methods to analyze the chemical composition of ancient Chinese glass, and combines the existing achievements of other scholars at home and abroad on ancient glass research, to predict the chemical composition of ancient Chinese glass from the perspective of scientific and technological research, Complete the prediction of ancient glass categories, based on the actual data.

Methods
After preprocessing the data, use SPSS [2] to conduct the R × C contingency table chi-square test on whether the surface is weathered and glass type, whether the surface is weathered and glass decorative pattern, and whether the surface is weathered and glass color , according to the significance level presented by the test results, judge whether the glass type, texture, and color have a significant impact on the surface weathering; Excel is used to analyze the statistical laws of the processed data. For the content data of high potassium glass and lead barium glass, as well as the chemical composition with or without weathering, the statistical laws of the chemical composition with or without weathering are obtained by comparing and analyzing the average value and median value, and are tested with standard error; At the same time, according to the test data of weathering points, the ratio of the average chemical composition content before and after weathering of the two types of glass is used to predict the corresponding chemical composition content before weathering.
Normalize the pretreated data, and then set the proportion of the training set and test set through SPSS to analyze and verify the chemical composition content of high potassium glass and lead barium glass by a decision tree, and obtain reasonable classification standards and rules; Then, the method of cluster analysis is applied to the sub-classification of different glass types, and appropriate methods are selected according to different data conditions judged by Matlab program: pedigree clustering method and K-means clustering method to obtain more scientific classification results. Then, the rationality and sensitivity of classification results are analyzed according to the "elbow method" by using the evaluation function image.
Based on the above results, analyze and identify the chemical components of the unknown glass relics, establish a BP neural network model to verify the results twice, and conduct sensitivity analysis by substituting the original data values and other methods to change the corresponding variable indicators.
Finally, for different types of glass cultural relics samples, use Matlab to carry out grey correlation analysis, take the chemical components with considerable content after normalization as the parent sequence to solve the correlation coefficient, and analyze the correlation between chemical components within the category, as well as the difference of the correlation between the same chemical components among the categories.

R × C contingency table chi-square test
In this paper, the relationships between the surface weathering of cultural relics and their glass type, decoration, and color are analyzed by using the R×C contingency table chi-square test, where R is the number of rows and C is that of columns in the contingency table. When the significance of the chi-square value (i.e., p-value) is less than 0.05, the original hypothesis is rejected, and the two variables are considered to be significantly correlated with 95% certainty. The chi-square test analysis shows that based on the surface weathering and type, the significance p-value is 0.020, showing significance at the level, and rejecting the original hypothesis. Therefore, the glass type has a significant impact on the surface weathering results of glass relics; Based on surface weathering and ornamentation, the significance p-value is 0.056, which is not significant at the level. Accepting the original assumption, ornamentation has no significant impact on the surface weathering results, but the p-value is close to 0.05, which does not exceed the significance level of 0.1, so ornamentation still has a certain impact on the surface weathering; Based on the surface weathering and color, the significance p-value is 0.507, which does not show significance horizontally. Accepting the original assumption, there is no significant difference between the surface weathering and color data [3] .
To sum up, the influence of the three kinds of relations from big to small is glass type, decoration, and color. The glass type has a significant impact on surface weathering, while the color has little impact on the surface weathering of glass relics.

Statistical analysis of the chemical composition content of each type of glass with or without weathering
According to the pretreated data, the data are divided into four parts -high potassium weathering, high potassium nonweathering, lead barium weathering, and g, and lead barinon weathering, according to the type of glass and whether there is weathering. The average value, standard error, and median corresponding to the content of various chemical components in these four parts are calculated using Excel. See Table 1 for an example: whether there is weathering statistical data for high potassium glass, and whether there is weathering data for lead barium glass.  Table 1, the comparison and analysis of high potassium weathering and high potassium nonweathering are carried out to compare the median and average values of the chemical composition of high potassium glass in non weathering and the chemical composition after weathering. The same is true for lead barium glass. Finally, the statistical rules of whether there is weathering chemical composition on the surface of high potassium glass and lead barium glass samples are as follows: (1) After weathering, the silica content of high potassium glass increased significantly, while the magnesium oxide content decreased; The content of alumina, iron oxide and copper oxide decreased; The content of sulfur dioxide increased; The contents of sodium oxide, lead oxide, barium oxide, phosphorus pentoxide, and strontium oxide decreased (to zero).
(2) After weathering, the content of silica in lead barium glass increased significantly; The contents of potassium oxide, aluminum oxide, barium oxide, and sodium oxide increased; The contents of calcium oxide, iron oxide, lead oxide, phosphorus pentoxide,e, and strontium oxide decreased; The content of copper oxide decreased slightly; The content of magnesium oxide is stable without obvious change.

Prediction of chemical composition before weathering
According to the above, the proportion of average values of various chemical compositions before and after weathering of two types of glass products is used to predict the chemical composition content of weathering points before weathering. The specific prediction results are shown in Table 2. 16 Since the sodium oxide, lead oxide, barium oxide, strontium oxide, and sulfur dioxide at the weathering point of high potassium glass are all zero, the predicted value before weathering is also zero, which is not shown in the datasheet. Similarly, sodium oxide and strontium oxide of lead barium glass are also zero, which are not shown in the following Table 3.

Establishment and solution of the decision tree model
The chemical composition content data and whether there is weathering (represented by 0-1 binary dummy variable) are imported into SPSS. In addition, the training set and test set data are divided by a ratio of 7:3, and a decision tree classification model is established. The structure of the decision tree is shown in Figure 1. It can be seen from the figure that there are 42 data in the training set, which are divided into 2 categories, 27 and 15 respectively, and the gini coefficients are all 0. In conclusion, under the condition of stable prediction of the decision tree model, the lead oxide content can be used as the basis for dividing the high potassium glass and the lead barium glass: when the lead oxide content is less than 6.965, the glass type is high potassium glass; When the lead oxide content is more than 6.965, the glass type is lead barium glass [4] .

Cluster analysis
To eliminate the impact of different sizes of indicators of different properties in the new processing table in the appendix, the data is scaled and mapped to the range of (0,1), and the data is processed in the same way and dimensionless, to better carry out comparative evaluation and comprehensive analysis. Based on normalization, the analysis shows that the subcategory division in this chapter conforms to the characteristics of cluster analysis, that is, the process of dividing a dataset into multiple groups or clusters composed of several similar objects, which maximizes the similarity of the same group or cluster and minimizes the similarity between different clusters. Hierarchical clustering, as one of the most commonly used methods of cluster analysis, first classifies the elements to be clustered into one category, then selects the two categories with the smallest distance to merge into one, calculates the distance between the new category and other categories, and then repeatedly merges the two categories with the closest distance until all samples are merged into one category to achieve the purpose of common clustering, It can identify any shape of sample space and converge to the global optimal solution. For the high potassium glass samples in the table, the results of the pedigree clustering method performed by Matlab are shown in Figure 2 and  From the above figure, with the help of the "elbow rule", we can see that the degree of distortion is significantly improved when k=3. Considering that k=3 is selected as the number of clusters, we can get the sub-classification results of high potassium glass as shown in Table 4 [5] : 14, 16 However, the classification standard of the hierarchical clustering method for lead barium glass is not obvious. Because of the large amount of data and small differences between classes, it is considered to use other analysis methods that are faster and easier to achieve clustering, such as the K-means clustering analysis method, which takes k points in space as the center to cluster, classify the objects closest to them, and update the values of each cluster center successively through iterative method until the best clustering result is obtained.
Due to the randomness of the initial center value of the K-means model, multiple K-means models are repeatedly trained, and different k-values are selected with the help of evaluation functions until a relatively appropriate clustering category is obtained. As shown in the figure below, when k=6, the degree of distortion is greatly improved. Considering that k=6 is selected as the clustering number of lead barium glasses, the final classification results of lead barium glasses are shown in Figure 4 and Table 5 [6] .

Analysis of classification results
According to the reverse inference and qualitative analysis of the division results, the high potassium glass can be divided into three subcategories. It can be seen that the main division basis may be the content of silica. The essence of glass is silicate. After weathering, silicate will be dehydrated and decomposed into silica, and the composition of some metal oxides will also increase. Therefore, the classification basis may be related to the weathering degree. These three categories are understood as "low weathering degree" (less silica content, low alkali gold oxide content) "Suspected weathering" (acid oxide content decreases, alkali metal oxide content increases, but silicon dioxide content remains at a low level), and "high weathering degree" (high silicon dioxide content and high alkali metal oxide content) have obvious classification standards, and can identify the possibility of local weathering in nonweathered glass, which is consistent with the subject, and further proves the rationality and stability of the model [7] .

Judgment of glass type
According to the decision tree model in question 2, it is judged that A1~A8 are respectively high potassium glass, lead barium glass, lead barium glass, lead barium glass, lead barium glass, high potassium glass, high potassium glass, and lead barium glass.
To determine the accuracy of the decision tree model, the BP neural network method is used to verify. After the data standardization of the attachment new processing form 2, add the virtual variable type (if the glass type is "high potassium glass", the value is "0"; if the glass type is "lead barium glass", the value is "1"), build a neural network model, learn all kinds of chemical components and glass types through Matlab machine to master the rules, and judge the glass type in the attachment form 3 with the model after training. Among them, the predicted result is between -0.25 and 0.25, which approximately considers that the glass type is "high potassium glass"; Results It is considered as "lead barium glass" in the range of 0.75~1.25. The prediction result of the BP neural network model is consistent with the judgment result of the previous decision tree model, that is, the accuracy of the above judgment result of A1~A8 glass type in Appendix Table 3 is verified [3] .

Sensitivity analysis of classification results
The sensitivity of the neural network model was tested with SPSS, and the results are shown in Table 6. lead barium lead barium 0.999900831 9.92E-05 It can be seen from the above table that the probability of error in the prediction results is extremely low, and the prediction results are still stable [8] .
At the same time, take the content of various chemical components as the input data, and use Matlab to calculate the sensitivity of various indicators to the prediction results. Among them, the sensitivity values of other chemical substances except for sulfur dioxide and lead oxide are extremely low, indicating that the prediction results are still stable when the content of other chemical substances changes or the data is inaccurate. Given the abnormity of the sulfur dioxide sensitivity value, considering that the data processed by the box chart may have an impact on the calculation result of sulfur dioxide sensitivity value, the sensitivity value of sulfur dioxide to the predicted result will be reduced after machine learning again with the data not processed by the box chart.
To sum up, the prediction of the BP neural network is generally stable. The sensitivity of chemical substances other than lead oxide is low, while the sensitivity of lead oxide to the prediction results is high, that is, the verification of lead oxide content can be used as the basis for judging the glass type. Therefore, the classification results predicted by the decision tree and neural network model are reasonable and stable.

Analysis of the strength of correlation
SPSS is used for grey correlation analysis. Taking high potassium glass as an example, the results are shown as shown in Table 7. The lead barium glass is treated in the same way (in which, the first row of indicators represents the parent sequence for each grey correlation analysis, the first column represents the characteristic sequence indicators, and the correlation degree of indicators is 1): reference data columns, the correlation degree between the chemical components of glass cultural relics samples of the same category is slightly different. Therefore, this paper will take the average of the two correlation degree values between the chemical components of glass samples of the same category to represent the correlation degree between the chemical components of glass samples of the same category. The specific results are shown in Table 8. The magnitude of the correlation degree indicates the strength of the correlation relationship, that is, the correlation between chemical components with a high correlation degree is stronger than that between chemical components with a low correlation degree in the table [9] .

Analysis of the difference in association relationship
Compare the correlation between similar chemical components in two types of glass samples, the result is shown in Table 9. The correlation between silica and calcium oxide, silica and alumina, silica and copper oxide, and silica and phosphorus pentoxide is smaller in high potassium glass than in lead barium glass; The correlation between calcium oxide and alumina, calcium oxide and copper oxide, calcium oxide and phosphorus pentoxide, alumina and copper oxide is smaller in high potassium glass than in lead barium glass; The correlation between alumina and phosphorus pentoxide, copper oxide and phosphorus pentoxide is greater in high potassium glass than in lead barium glass.

Conclusion
(1) The classification standard is obtained through the decision tree, and the prediction results of the BP neural network are used to verify each other. Moreover, the adaptive ability and fault tolerance of neural networks make the model results not only intuitive and easy to understand but also stable and reliable based on the decision tree.
(2) Two cluster analysis methods are used to classify high potassium glass and lead barium glass, making full use of the complementary advantages of the two methods, and analyzing the rationality and sensitivity of the classification results.
(3) The use of grey relational degree is a comprehensive evaluation model combining qualitative analysis and quantitative analysis, which makes the evaluation results more impressive and accurate. Only a few representative samples can have strong reliability and rationality.