Research on Combination Prediction of Shanghai Composite Index Based on IOWGA Operator

: In this paper, the ARIMA(2,1,2) model and LSTM model are used to predict the Shanghai Composite Index, and then the IOWGA operator is used to establish a combined forecast model to further improve the model forecast accuracy. The empirical results show that the errors of the combined forecasting model are lower than those of the individual forecasting models, and the average precision is better than that of the individual forecasting models. In addition, the forecast results show that the Shanghai Composite Index will fluctuate slightly in the next five trading days.


Introduction 1.1 Research Background and Significance
With the continuous development of the market economy and the vigorous development of the domestic financial market, the stock market has become another important channel for the public to invest and manage money.However, stock prices are highly unstable due to the influence of various independent variables.Therefore, applying modern scientific methods and technical tools to predict the trend of stocks can reduce investment risks and obtain certain considerable returns.
In addition, the sudden outbreak of the new crown pneumonia epidemic in early 2020 has impacted the global economy.In particular, since the end of 2021, the Shanghai Composite Index has continued to fluctuate and fall, falling below 3,000 points all the way, setting another new low in 2020.As a multi-independent variable time series data, there are currently many methods for stock index forecasting.Although the single-item forecasting method explains the law of stock price changes to a certain extent, the forecasting effect and accuracy are average.Therefore, in order to take advantage of the characteristics of various forecasting methods, many scholars have proposed to combine various individual forecasting methods appropriately to further improve the accuracy and forecasting effect of the forecasting model.In this paper, the ARIMA and LSTM neural network models are selected, and the IOWGA operator is established to construct a combined forecasting model.

Literature Review
Previous studies on stock index forecasting focused on the use of single forecasting models, such as ARIMA, neural network, SVM, etc. Nowadays, many scholars focus on combinatorial optimization of one-way forecasting [1][2] .Liu Jiaming and Liu Haibin (2014) established a combined forecasting model of ARIMA and BP neural network, respectively fitting the linear and nonlinear residual parts of the time series, and then integrated them to improve the forecasting accuracy of the model [3] .Cheng Changpin et al. ( 2012) decomposed the time series data with wavelet decomposition algorithm, separated the low-frequency and high-frequency information in the non-stationary sequence, used ARIMA and SVM model fitting respectively, and then superimposed the model prediction results to obtain the original sequence prediction value [4] .Wu Dashuo (2020) proposed to improve the LSTM model stock index prediction based on genetic algorithm, and compared its prediction accuracy with BP neural network and LSTM network model through the demonstration of Nasdaq data [5] .
Combing the relevant literature, scholars have various research methods on stock index forecasting.The common methods include gray forecasting, ARIMA model, BP neural network, SVM model, etc.In this paper, two common single-item forecasting methods, ARIMA and LSTM, are used.The IOWGA operator establishes a combined forecasting model, and empirically tests the effectiveness of the combined forecasting.

IOWGA Operator
Assuming n two-dimensional arrays , where 1 2 3 , , , , is the subscript corresponding to the i-th largest number of 1 2 3 , , , , n v v v v arranged in descending order, and w IOWGA is the n-dimensional induced ordered weighted geometric mean operator, that is, the n-dimensional IOWGA operator.

Prediction Accuracy
For the prediction of the same economic problem, assuming that there are n kinds of prediction methods, it x is the true value ( 1, 2, , ; 1, 2, , i n t N  ) of the i-th prediction method at time t,and it z represents the accuracy of the i-th prediction method at time t,

IOWGA Combination Model
According to the prediction accuracy and prediction value of n prediction methods at time t, n twodimensional arrays are constructed: ( , , , , , , ) . n kinds of prediction methods at time t are ordered by prediction accuracy in descending order, and () z index i  represents the ith largest prediction accuracy subscriptat time t, let: ln ln  , then the total combination forecast logarithmic error square sum S of N period is: (3) Therefore, the combined prediction model based on IOWGA based on the sum of squared logarithmic errors can be expressed as the following model: w w e e w st w i n is called the combined prediction logarithmic error information square matrix of n-order IOWGA, so the model can be expressed in matrix form:

Error Evaluation Index
RMSE, SSE, MAE, MRE, and MAP can all evaluate the prediction deviation. (10)

Variable Selection and Data Sources
This paper selects the time-series data of the closing price of the Shanghai Composite Index from 2005/1/4 to 2022/5/31 as the research object, and initially observes its trend chart.

ARIMA (p,d,q) Prediction Model
ARIMA differential autoregressive moving average model is a combination of autoregressive model, moving average model and difference method.Among them, p is the number of autoregressive items, q is the number of moving average items, and d is the order of difference made to make the data a stationary sequence.Its model can be expressed as: (1 ) It can be seen from Figure 1 that the time-series data of the Shanghai Composite Index is nonstationary, and the P value of the ADF test is 0.1118 higher than 0.05, which further indicates that the time-series data is non-stationary, and it can be transformed into a stationary sequence by d times difference and ADF test.
The P value of the ADF test result after the first-order difference is less than 0.05, which indicates that a stationary sequence can be obtained after the first-order difference, and the Q statistic with the probability P  0.05 can judge that the first-order difference sequence is non-white noise.The autoregressive and moving average orders were preliminarily judged from the ACF and PACF correlograms, and the p value was determined to be 2 and the q value to be 2 by AIC and BIC.
According to the number of items obtained and the order of difference, the ARIMA (2,1,2) model is constructed, and the results are obtained and judged: the t values of AR(1), AR(2), MA(1) and MA(2) are respectively -2.201728, -21.19044, 2.058227 and 17.24893, the t test passed; the DW value was 1.936351, indicating that the model does not have serial autocorrelation; at the same time, the model passed the White test, indicating that the model does not have heteroscedasticity.Through the above tests, it shows that ARIMA (2, 1, 2) is suitable for the prediction of Shanghai Composite Index in recent years.The estimated form of its model is: 0.086043 0.786569 0.091878 0.726932 So it can be deduced that the prediction formula of the ARIMA (2, 1, 2) model is: 0.086043 0.786569 0.091878 0.726932 According to the model ( 13), the data in the sample period can be predicted and compared, and the specific results are shown in Figure 2.

LSTM neural network
LSTM is a kind of recurrent neural network, which is widely used in sequence forecasting, such as weather forecasting, stock market forecasting and so on.It mainly consists of five different parts: unit hidden, hidden state, input gate, forget gate and output gate.
In this paper, the LSTM neural network is used to predict the Shanghai Composite Index.Using Python software, a total of 4179 trading days from 2005/1/4 to 2022/3/15 are used as training samples.The forget gate of the model uses the sigmoid function to selectively pass through the variable information.The input gate uses the combination of the sigmiod layer and the tanh layer to process and update the data.The output gate uses the sigmiod function to determine the part that needs to be output, and after processing with the tanh layer.The data are multiplied to determine the output information.In terms of model training, the loss function is MSE, the optimization algorithm is Adam, the model training batch data is 1, the number of training is 100, and the number of neuron layers is 8.The final prediction results are shown in Table 1.

Model Construction
Taking the prediction accuracy of the above two individual prediction methods as the induced value, the induced value and the predicted value are reordered and sorted to obtain the error information matrix E. Then the optimal solution of model ( 5) is solved to obtain the optimal weight vector, and the IOWGA combined prediction model is constructed.
The predicted value and accuracy are sorted according to the accuracy from small to large, and the logarithmic error information matrix E of the model is obtained: 0.011605 0.010639 0.010639 0.009832 Using the logarithmic error information matrix E, the sum prediction model of the IOWGA operator is constructed based on the minimum sum of logarithmic error squares and S: the minimum value of S is 0.0098.
The optimal solution is obtained through Matlab: , which means that the weight 1 0.0002 w  is assigned to the predicted value with the lowest accuracy of the two individual predictions, and the weight 2 0.9998 w  is assigned to the predicted value with the highest accuracy, and the weight of the two predicted values is taken The geometric mean is used as the combined forecast over the sample period.The combined model constructed is: ˆ0.0002 0.9998 According to the model ( 16), the prediction value and prediction accuracy of the combination pre diction model are obtained, see Figure 3 for details.

Model Evaluation
Taking RMSE , SSE , MAE , MRE and MAP as error metrics, the forecasting effects of each forecasting model and the combined forecasting model of IOWGA are calculated and compared.The specific results are shown in Table 1.  1, the error measurement index values of the combined forecasting model based on the IOWGA operator are lower than those of the single forecasting model, and the forecasting effect of the combined forecasting model is better.Its average prediction accuracy is as high as 0.9901, which is higher than the average prediction accuracy of the other two items.Therefore, the establishment of a combined prediction model based on the IOWGA operator is conducive to further improving the prediction accuracy of the Shanghai Composite Index.

Prediction Results
Since there is no real data for the next 5 trading days, it is impossible to compare the forecasting accuracy of the individual forecasting models, and thus it is impossible to sort them out.Referring to the relevant literature, according to the weight values assigned by each individual forecast model in the combined forecast model on each trading day, the weights of the ARIMA model and the LSTM model can be obtained by resummation and arithmetic mean.The combined forecast model based on the IOWGA operator can be obtained.According to the obtained model, the forecast value of the Shanghai Composite Index for the next 5 trading days is calculated.The forecast results are shown in Table 2.
Table 2: Predicted values of each model for the next 5 trading days

Conclusion
In this paper, the ARIMA (2, 1, 2) model and LSTM model are used to predict the Shanghai Composite Index, and then the combined forecasting model is used to predict the stock index.The results show that due to the limitations of the model itself, the forecast accuracy of the single forecast model needs to be improved, while the forecast accuracy of the combined forecast model based on the IOWGA operator is 0.9901, and the forecast error is lower than that of each single forecast.
In addition, the forecast value of the Shanghai Composite Index in the next five trading days is obtained through the combined forecasting model (see Table 2).The results show that the Shanghai Composite Index will fluctuate slightly and decline in the next five trading days.The reasons for this phenomenon may be the sluggish economic market environment, company operating conditions and policy factors, etc.Overall, the stock market may be in a short market state.
Since only two single-item forecasting models are selected in this paper, the stock forecast is affected by multiple independent variables.There are linear and nonlinear parts in the time series of the Shanghai Composite Index, which cannot be optimized separately.

Figure 2 :
Figure 2: ARIMA and LSTM prediction model accuracy comparison

Figure 3 :
Figure 3: Predicted values of IOWGA combined forecasting model

Table 1 :
Comparison of model errorsAccording to Table There is room for