Construction of Valuation System with Chinese Characteristics Based on Entropy Weight Method and K-means

: "Stock valuation" is one of the main reference factors for investment in the securities stock market. With the continuous development of China's securities market, the traditional "price-earnings ratio valuation model" is no longer adapted to the requirements of China's market, the logic of stock valuation in China's securities market needs to be redefined in order to enable investors to make better decisions. In this regard, Yi Huiman, Chairman of the China Securities Regulatory Commission (CSRC), has put forward the idea of constructing a valuation system with Chinese characteristics, so as to enable real quality stocks, i.e., those with high dividends, low valuation and good growth, to obtain more valuation and win the attention of investors. Therefore, this paper is dedicated to constructing a valuation system with Chinese characteristics and redefining the valuation model to help investors better value stocks in the Chinese stock market. After that, this paper categorizes the valuation stocks with Chinese characteristics in Shanghai and Shenzhen A-shares, and obtains two types of stocks: long-term investment type and short-and medium-term investment type, which help investors in stock investment in Chinese stock market.


Introduction
The core of portfolio investment focuses on the estimation of the intrinsic value of stocks based on stock market investments.Nowadays, there are many stock valuation models in the capital market, including price-earnings ratio valuation, price-net ratio valuation and discounted cash flow model, etc.However, for the Chinese stock market, it is important to combine the valuation model with the market context, taking into account the policy background under the Chinese market system and the intrinsic growth of the stock.
In 2022, Yi Huiman, Chairman of the China Securities Regulatory Commission (CSRC) [1], first proposed the concept of "special valuation" or "valuation system with Chinese characteristics".The term "China Special Valuation" refers to a diversified valuation system formed by adding Chinese elements to general market valuation methods and integrating them to adapt to the actual situation of the Chinese market, with distinctive Chinese characteristics.
With the core elements of policy orientation, value investment, capital allocation and risk management, "China Specialized Valuation" is dedicated to improving "low valuation stocks" and "high quality growth stocks", "nationally supported industries and fields" to promote the growth of high-return, high-quality stocks [2].
Therefore, this paper is dedicated to the establishment of "Valuation System with Chinese Characteristics" and the categorization of key stocks under the system, which provides a more targeted program for the evaluation of the intrinsic value of enterprises and the market investment direction, and also provides investors with a more accurate valuation of stocks.

Selection of Characteristic Indicators
Combined with the main factors affecting the valuation system with Chinese characteristics pointed out by Li Xiaorong [3] et al. in their study, this paper selects nine indicators based on the four main core elements of the valuation system with Chinese characteristics, i.e., policy orientation, value investment, risk management, and capital allocation, namely, whether or not it is a state-owned enterprise, ROE (return on equity), whether or not it is a national key project, whether or not it is an innovative science and technology-based enterprise, PE ratio, PCF ratio, PB ratio, PS ratio, and net ratio.This paper draws Characteristic Indicators of Valuation System with Chinese Characteristics Mind Map as in Figure 1.

Calculation of Characteristic Indicator Weights Using the Entropy Weighting Method
In order to make the model as objective and accurate as possible, this paper chooses to use the entropy weighting method to weight and score the characteristic indexes and hotspot coefficient indexes of the medium and special valuation stocks to derive the scores of each stock, and then use the scores of each stock to react to their short and long-term investment value.
The entropy weighting method is suitable for evaluating a limited number of decisions, comparing the information entropy of different indicators, determining the weights of the indicators, and finally arriving at the optimal strategy [4].Generally speaking, if the information entropy of a certain indicator is smaller, the greater the degree of variation of different decisions on the indicator on the surface, the more information it provides, the greater the role it plays, and the greater its weight [5][6].
Step1: Find the information entropy of each indicator: The information entropy of a set of data according to the definition of information entropy: Step2: Determine the weight of each indicator: Step3: The weights of the indicators are given in Table 1  1, this paper shows that whether it is an innovative technology company and whether it is a national key project account for more than 80% of the components.This paper collects the data of characteristic indexes of 75 medium-valued stocks in Shanghai and Shenzhen A-shares, then classifies the medium-valued stocks and analyzes the characteristics of the categories, clusters the data of medium-valued stocks by using K-mean clustering analysis and derives the number of categories by using the elbow rule, and finally derives the characteristics of the categories of medium-valued stocks by analyzing the cluster center coordinate and combining with the related domain information.

Exploring Stock Classification under the Valuation System with Chinese Characteristics
The 75 stocks of Shanghai and Shenzhen A-shares collected in this paper are shown in Table 2.

Data Coding
The data, because whether the state holding, whether the national key projects, whether innovation and science and technology categories are fixed type data, in order to convert them into indicators quantitative data, the fixed type data into quantitative data using the form of 0-1 coding, in which "1" represents affirmative, "0 " represents negative.

Data Standardization
Because the selected data unit is not uniform, so the data in addition to the influence of the numerical value should be considered in addition to the influence of the scale in order to eliminate the influence of the scale of the data on the standardization process.Different standardization methods should be used for data with different positive and negative orientations to ensure that the processed data not only eliminates the effect of the scale, but also eliminates the effect of positive and negative orientations. (3)

K-means Classification Modeling and Solution
In this paper, the collected stocks are classified, and this paper carries out unsupervised K-means clustering analysis on the collected 75 stocks based on the characteristic indexes of the established valuation system with Chinese characteristics.
K-Means clustering algorithm is one of the most common unsupervised learning clustering algorithms, which is a typical distance-based non-hierarchical clustering algorithm.The data are divided into a predetermined number of classes K on the basis of minimizing the error function, and the distance is used as an evaluation index of similarity, i.e., the larger the distance between two objects is considered to be, the more similar they are [7][8].
Step 1: Randomly select  clustering centers of mass then there exist  clusters  () .
Step 2: All the data points are associated to the closest center of mass and based on this the clusters are divided.Calculate the distance   between  () and each center of mass, then  ()  belongs to the cluster  (j) with his nearest center of mass   .
Step 3: For each class  (j) , recalculate the value of the cluster's center of mass move the mass to the center of the currently delineated clusters containing all the data points (means) [5]; repeat the second and third steps n times until the sum of the squares of the distances of all the points to the centers of mass of the clusters to which they belong is minimized.
The key formulas in the K-means algorithm involve calculating the distances between the data points and the cluster centers, and updating the cluster centers.Following are the key formulas of K-means algorithm.
Calculating the distance between data points and cluster centers: K-means usually uses Euclidean distance (Euclidean distance) to measure the distance between data points and cluster centers.The Euclidean distance formula is as follows [9].
Where i denotes each dimension of the data point, and x and y denote the data point and cluster center, respectively.
In K-means, the above formula is usually used to calculate the distance between each data point and each cluster center to determine the closest cluster to assign the data point to.Updating the cluster centers: once the data points are assigned to the nearest cluster, the center of each cluster needs to be updated.Typically, the center of a cluster is the average of all data points within that cluster.The update formula is as follows: New cluster center: Where  denotes the number of the cluster,   denotes the number of data points in that cluster, and ∑   =1 denotes the accumulation of all data points in that cluster.This formula calculates the new center of the cluster and then replaces the old cluster center with it.In this way, in the next iteration, the data points will be compared with the new cluster center to reassign the data points.
The K-means algorithm is an iterative process that iterative process calculates distances, reassigns data points, and updates cluster centers, eventually reaching a state of convergence where the cluster centers no longer change or other stopping conditions are met [10].

Stock Cluster Category Visualization and K-Means Cluster Analysis Effectiveness Analysis
In this paper, through the K-mean clustering based on the "elbow rule" [11][12][13], the collected stocks are clustered into two categories, and the cluster summary is plotted in Figure .2, and the cluster scatter plot is plotted in Figure.3, we find that there are certain differences between stocks, and the classification effect of the K-Means cluster analysis proposed in this paper is obvious.3 and 4, the characteristics of the clustering categories are obtained: Clustering category 1 (long-term investment category):

Specific Analysis and Definition of Stock Clustering Categories
The composition of state-owned shares is large, consistent with the national key projects and the national strategic direction, the reform of state-owned enterprises and policies are released, and more dividends are expected to obtain more earnings.The price-to-sales ratio and price-to-book ratio of state-owned shares are relatively small, and the valuation is lower than that of cluster 2, which has better growth in the long run and is suitable for long-term investment [14][15].
Cluster category 2 (short to medium term investment category): Innovation and technology category has a large component, is the national key support industries and fields, release of new products can be a short period of time to get a high investment income, with good growth; by the press release hot spots and industrial adjustment, suitable for short-term investment [16], combined with the results of the clustering can be seen, this category is generally the industry high-tech enterprises.

Conclusion
The results of this study find nine characteristic indicators of the valuation system with Chinese characteristics by analyzing the core connotation of the valuation system with Chinese characteristics, which provides help for investors' stock valuation, but there are also some secondary indicators of a valuation system, which need to be considered and analyzed by investors in a comprehensive manner.If more minor influences are to be considered, this can be achieved by further expanding the scope of data collection or introducing more relevant characteristics.
Then, based on the established valuation system with Chinese characteristics, this study selects 75 stocks in the Shanghai and Shenzhen A-share stock markets for K-Means clustering, and obtains two kinds of stocks based on the valuation system with Chinese characteristics in the short-to medium-term investment category and the long-term investment category, which provides assistance to a wide range of investors.
In addition, adding more data and incorporating deeper expertise into the data can improve the accuracy of K-Means clustering, which improves the returns of stock investment strategies.
In conclusion, this study provides decision makers with a novel Chinese stock valuation model for investing in the Chinese stock market.However, to further improve the accuracy and robustness of the model, future research may focus on increasing the amount of data and optimizing the model structure to cope with the volatility and changes in the equity securities market.

Figure 1 :
Figure 1: Characteristic Indicators of Valuation System with Chinese Characteristics Mind Map 3.

Table 2 :
Collected 75 stocks of Shanghai and Shenzhen A-shares

Table 4 :
List of center coordinates of clustering centers