Time series regression based on Bayesian model averaging and principal component analysis

: This paper proposed an adaptive prediction model for high-dimensional time series data based on model averaging method and principal component analysis. Specifically, this paper considers the case where the response variable is a scalar and the predictor variable is a time series. Firstly, the high-dimensional time series data is extracted information by principal component analysis. Secondly, the Bayesian model averaging method is used to perform the forecast task based on the principal component projection matrix. The proposed method can effectively deal with the unsupervised nature of PCA and avoid the problem of selecting the number of PCA. It is demonstrated that the proposed method is competitive compared with the lasso regression and the ridge regression by real data analyses.


Introduction
In the era of rapid development of intelligence, people are confronted with complex types of data and huge information.The objects to be predicted are often affected by multiple related factors, and the dimensional space number of data indicator variables to be processed is usually one or more orders of magnitude of the sample number, which is called high-dimensional data.Therefore, research on feature extraction and prediction of high-dimensional data can make a great contribution to information processing and quantitative decision-making in various fields.Zhao Xun [1] and others used the principal component analysis and neural network algorithm to deal with high-dimensional data to predict the related parameters of residents' consumption and the relationship between the per capita consumption spending next year; Zhang He [2] et al. based on the Lasso regression model, predicted and analyzed the marine economic industry of Qingdao with 20 characteristic variables.Shi Yang [3] predicted soil composition based on neural network and the partial least squares method for high-dimensional data containing spectral absorbance of multiple wavelengths.Meng Qinglong et al. [4] performed principal component regression on spectral data of apples to predict the soluble solid content of apples.In this paper, high-dimensional time series data about meat content are used to carry out the experiment of the model.The predictive variables in the data record of each meat sample are the 100 wavelengths of screening, and these 100 variable indicators comprehensively reflect the organic matter content in meat.The time series data of meat are presented in the form of waves in Figure 1.Each line represents a meat sample and contains 100 absorbance degrees on each line, that is, the original dataset contains 100 dimensions.In the prediction of high-dimensional data, the principal component regression is often used in model building, and the contribution rate of variance and the magnitude of eigenvalues were used as the basis for selecting principal components, so it is easy to make the information with a large contribution rate in the front cover the information in the back.Moreover, the principal component regression algorithm carries out principal component analysis and multiple linear regression step by step.In the process of processing, the relationship between the selected principal components and the response variables is uncertain, so the established model may ignore the principal components strongly related to the response variables, thus making the prediction effect of the model poor.Some scholars have improved the principal component regression, which has significantly improved the selection of variables and the accuracy of prediction.Shi Yang [3] used partial least squares and stepwise regression models to predict the spectral data of soil composition and realized the screening of independent variables to obtain the optimal set of explanatory variables.Zhu Hailong [5] et al. proposed to analyze the influencing factors of Anhui provincial financial revenue through ridge regression and Lasso regression.Xu Yunjuan [6] et al. conducted Lasso dimensionality reduction of principal components based on variable clustering, and the accuracy of variable selection was improved.Li Yajuan [7] solved the functional linear regression model by using functional principal component estimation and used Lasso to select the characteristic function to predict the return rate.Song Xiaofeng [8] optimized the model and accurately predicted the air quality based on ridge regression.
In this paper, a prediction method combining the model averaging and principal component regression is applied to time series data.The significant advantage of this method is that it considers the potential relationship between the response and principal component scores, and avoids the choice of the number of principal components.

Principal component regression
Principal component analysis [9] is known as a method of dimension reduction, multiple highly correlated variables in the original data set are transformed into a group of new independent or unrelated variables by linear combination.Suppose that there are n samples and each sample has p data, then the  ×  dimensional matrix is formed: The eigenvalues and eigenvectors of the matrix can be calculated by Jacobian method [10] :  and   .The eigenvalue   corresponds to the projected variance of each principal component, which can reflect the amount of information provided by every component.When  is larger, more information is contained.i = 1,2,⋯,p, principal component can be expressed as: where Z p is the Standardized predictor variables.We calculate the cumulative contribution rate and determine the number of principal components k.The k factors with the cumulative contribution rate of more than 80% were selected as explanatory variables of the multiple linear regression equation and the least square method was used to solve the parameters: Finally, transforming the solved model into the principal component regression model based on the original variables by matrix transformation.

Model averaging based on Bayes (BMA)
The principle and procedure of Bayesian model averaging are as follows: In the first step, BMA randomly combines the principal components processed by PCA.For a dataset containing k explanatory variables, they can be combined to generate K possible linear regression models, whose model space is called M. The second step is to calculate the posterior probability.On the premise of obtained data D, the prior probability of each model is set as (  ) and the prior distribution to solve the posterior probability [11] : Among them： (|  ) = ∫ (|θ k ,   )(  |  )  , (|  ) is the marginal likelihood function of model   ，  is the parameter of   ，(  |  ) is the prior density function of   in model   ，(  |  ) is the likelihood function.Then, the weights of the models in the model class are determined by the posterior model probability and adjusted according to the degree of influence of the given variables on the results.In order to avoid overfitting of the model to the training set, Bayesian information Criterion (BIC) is used to penalize the model.BIC [12] was defined as: Among them, k is the number of model parameters, n is the number of samples, and L is the maximum likelihood function.Determine the combined weights for each model:

Principal component regression based on the model averaging
The selection of principal components plays a key role in the establishment of regression model.The combination of principal components can change the parameter size and prediction accuracy of the prediction model.Therefore, the idea of model averaging is combined with the principal component regression algorithm to generate an adaptive model combining different explanatory variables for prediction to avoid the risk of poor prediction stability of regression model in highdimensional time series data.We use a relatively simple model averaging method based on Bayes to build a model and compare its advantages in high dimensional data prediction.
Firstly, using PCA to reduce the dimension of data.For fear of losing the information related to the response variables, the projection data with the variance contribution rate up to 99.99% were extracted for combined prediction.Specifically, K regression models were generated by random combinations of principal components in the projection data, and their prior probabilities and prior distributions of each parameter were determined.
Secondly, the BIC information criterion was used to punish these K models to enhance the generalization ability of the final model.
Finally, the combined weight of each model is calculated by the posterior model probability and the weighted average is obtained to get the final prediction result.Figure 2 shows the average based on the Bayesian model of principal component regression flow chart:

Data source and processing
The data comes from the website：http://lib.stat.cmu.edu/datasets/tecatorwhere data is recorded from Tecator Infratec Food and feed analyzer.The dataset contains a total of 215 meat samples.Each meat sample consists of absorbance at 100 different wavelengths and the content of fat, water and protein.The basic experimental process is shown in Figure 3:

Comparison experiment
Because the original data contains three reaction variables, namely water, protein and fat, three data sets were generated respectively.Firstly, the dimensionality of each dataset was reduced by PCA, and the projection data with a cumulative contribution of 99.99% was selected.After PCA processing, the predictive variable matrix of each dataset was converted from  215×100 to  215×6 .In other words, six principal components of information were extracted from 100 spectral absorption rate variables after PCA treatment as explanatory variables in the next regression analysis.Secondly, the data was split into training set and test set according to the ratio of 7 to 3, and used for model fitting and testing respectively.Then, Bayesian model averaging, Lasso regression and Ridge regression analysis were performed on the three datasets by using Python.The mean absolute error (MAE) and mean square error (MSE) between the output prediction results and the actual results were calculated respectively.We use Monte Carlo algorithm to generate 100 sets of simulated data sets for each of the three data sets and repeated the regression prediction experiment 100 times to verify the stability and correctness of the model prediction.The prediction accuracy of the model reflects the prediction accuracy and generalization ability.Among the prediction errors obtained by repeated training of the three models, the average prediction error of the Bayesian model averaging method is the smallest, while the Lasso regression method is the largest.The prediction errors of 100 simulated experiments of the three task sets are shown in the box plot in Figure 4.The two types of prediction errors of the three datasets were analyzed respectively, and it was found that the MAE and MSE values of the model averaging method were less than 10, and their variances were within the interval (0,2), indicating that the model has high accuracy and low risk in the prediction process.In comparison, this algorithm is more stable.Table 1-3 display the prediction error results for meat water, fat and protein content prediction: The experimental results show that the prediction error of principal component regression based on Bayesian model average is significantly smaller than that of Lasso regression and ridge regression, which has the highest prediction accuracy and the strongest generalization ability.Good prediction results were obtained in three different meat task sets.Therefore, this algorithm is superior to Lasso regression and Ridge regression when predicting high-dimensional time series data.

Conclusion
Experiments found that in dealing with high dimension and less sampled data, using principal component analysis to under the condition of the least loss of information to convert data into low dimension space is analyzed.In the principal component regression forecasting model, the number of principal components is more reliant on the cumulative contribution rate for selection.It is easy to cause the previous data to overwrite the amount of information behind.In addition, the establishment of the model does not take into account the uncertain relationship between principal components and response variables, so it is risky to use the model established by principal component regression to make predictions.We only use the model averaging method based on Bayes criterion to prove that this algorithm can predict without relying on the selected model and fully consider the uncertain relationship between principal components and response variables.Therefore, the results of time series prediction using the information criterion with more strict punishment on the model could be more accurate.The algorithm can be applied to high-dimensional time series data in different scenarios to adaptively train a model with high prediction accuracy.

Figure 2 :
Figure 2: Principal component regression based on Bayesian model average

Figure 3 :
Figure 3: Basic steps for meat content prediction