Sunspot activity prediction based on adaptive hybrid algorithms

: In this paper, by combining the ARIMA model and the BP neural network model, we establish an adaptive hybrid ARIMA-BP neural network model, which provides more accurate results for sunspot prediction. For solar activity prediction, in this paper, based on the multivariate nonlinear regression and BP neural network model, we utilize the differential evolutionary algorithm for model solving and obtain satisfactory hybrid model solving results. These results provide new perspectives and methods for solar activity prediction, and provide useful references and insights for research and practice in related fields.


Introduction
Solar activity, especially the appearance of sunspots on the surface of the sun, is a fascinating phenomenon of great significance for space weather forecasting and all aspects of the Earth's atmospheric conditions.Sunspots are temporary black spots on the solar sphere generated by a concentrated magnetic flux, leading to a local temperature reduction and convective suppression.Sunspots occur within active regions, often appear in pairs, have opposite magnetic poles, and exhibit periodic patterns consistent with a solar cycle of about 11 years.
At present, some scholars have carried out research in related fields.Yuan et al [1] proposed a PV power prediction method combining two techniques.The method uses a fast correlation filtering algorithm to extract meteorological features with a strong correlation with PV power generation.The full systematic empirical modal decomposition method with an adaptive noise model is used to decompose the data into high and low-frequency components, which reduces the volatility of the data.Then, the long-short-term neural network and deep confidence network are combined into a new prediction model for each component.Finally, the proposed combined PV power prediction method is analyzed by examples and compared with other prediction methods.The results show that the proposed combined prediction method has high prediction accuracy.Chen et al [2] proposed a hybrid ARIMA-LR algorithm based on a Bayesian combinatorial model, which demonstrated outstanding performance in targeting the prediction of air cargo volume.The algorithm is adaptive with respect to the movement of the series and reacts quickly to sudden changes.Moustafa et al [3] used three single and hybrid models, Long Short-Term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), and Seasonal Autoregressive Integrated Moving Average (SARIMA), for forecasting the maximum number of blacks for cycles 25 and 26.The hyperparameters of the singular models were optimized using a Bayesian optimization approach.The LSTM-ARIMA hybrid model gave the best performance.The outstanding results of the LSTM-ARIMA model show the potential of the hybrid approach in improving the overall performance.In addition, the ability of the LSTM model to outperform the ARIMA model demonstrates the ability of the LSTM network to learn from time-series data.Dang et al [4] compared three important non-deep learning models, four popular deep learning models, and their five integrated models for predicting sunspot numbers.In particular, an integrated model called XGBoost-dl is proposed which uses XGBoost as a two-level nonlinear integration method to combine deep learning models.[5] have estimated the sunspot number (SN) predictions over the recent solar cycle 24.To find the best model, moving average (MA), exponential smoothing (ES) and autoregression (AR) were used.In addition to this, in two other experiments, seasonal components were extracted using moving average (MA) and exponential smoothing (ES) and trend components were calculated with the help of simple regression analysis (RA).This exploration was solely to understand the differences between these models and the impact of these two components on the prediction of sunspots using moving average (MA) and exponential smoothing (ES).The forecast results reveal this difference and impact.Lessons are provided for other time series analysis (TSA) models to predict sunspot numbers.
In this paper, sunspot numbers and periods are predicted by constructing adaptive ARIMA-BP neural networks and adaptive multiple nonlinear regression-BP neural network models.

ARIMA modeling
The essence of the ARIMA model is the combination of the difference operation with the ARMA model, denoted as ARIMA (p,d,q).The ARIMA model can be formulated as: where   is a time series of historical observations,  is the order of the difference,  and  are the autoregressive model order and the moving average of previous observations, and   is a sequence of independent and identically distributed white noise with zero mean and constant variance. is the lag operator, and  satisfies the following expression: The focus of building an ARIMA (p,d,q) model is on the selection of the three parameters of (p,d,q).d is the order of the difference, and the purpose of the difference is to change the original series of observations into a smooth time series.In this paper, Bayesian Information Criterion (BIC) is used to select p and q.The Bayesian Information Criterion can give a simple approximation of the logit model evidence as follows.
where p is the number of parameters and N is the number of data points.

BP neural network modeling
BP neural network is a multilayer feed-forward algorithm that consists of input, hidden and output layers.There is work signal and error signal propagation between layers.Figure 1 shows the neural network structure.
where   is the connection weight of the ith neuron to the jth output.Remember that the error is when the network is on (  ,   ): When the neural network completes the forward computation, the error value is obtained by subtracting the predicted value from the actual value, followed by backpropagation to adjust the weight threshold of the neural network.The iterative update formula for  and  is given by: Where  ℎ is the input data of this neuron.Based on this, the neural network constantly adjusts the weights and thresholds during its training process, so that the prediction error of the neural network is constantly approaching 0.

Prediction model construction based on the GABP neural network
In this paper, the genetic algorithm was used to optimize the BP neural network, and the forward propagation process of the BP neural network was used to calculate the fitness of each individual in the iterative process, so as to improve the optimization efficiency of the algorithm.The design framework of the algorithm is shown in the following algorithm.In the coding part of the genetic algorithm, IGABP continuously represents the weights and thresholds of the neural network as a vector for constituting the expression of individual genes.Since the structure of the network has been determined during the running of the algorithm, and the number of weights and thresholds to be determined have been determined, the length of the chromosome remains constant during the iteration process.
In the part of the genetic algorithm that calculates the fitness of an individual, compared to GABP uses decoded individuals to initialize the neural network and then calculates the fitness based on the output of the training, IGABP starts from the principle of the process of forward propagation of the neural network and directly calculates the fitness of an individual, which eliminates the amount of computation required for the training and improves the optimization efficiency of the algorithm.
Encoding and Decoding Assume that the neural network used in BAGP is shown in Figure 2:  In the chromosome designed in this paper, the gene loci are expressed in order: the weights between the input layer and the hidden layer, the threshold of the hidden layer, the weights between the hidden layer and the output layer, and the weights of the output layer, respectively.From this, the complete structure of a neural network can be determined.

Modeling of Adaptive Hybrid ARIMA-BP Neural Networks
For each prediction algorithm, some of the sequences are passed as a test set.The main idea of the hybrid algorithm designed in this article is that the better the performance in the past period's prediction the higher the weight in the future prediction and the higher the contribution to the predicted value.
For the actual observations are written as:   = { 1 ,  2 ,  3 , … ,   } .The predicted value of algorithm k is written as:  ̂, = { ̂,1 ,  ̂,2 ,  ̂,3 , … ,  ̂, }.The prediction error can be denoted as: The total error of algorithm k can be formulated as: In a hybrid algorithm, if an algorithm performs better in the test set, the weight is higher.The weight of algorithm k in the hybrid algorithm can be denoted as: where   is the weight of algorithm k in the hybrid algorithm.In each calculation of the hybrid prediction value, it is necessary to combine the prediction value of the ARIMA algorithm and BP neural network algorithm.Its calculation formula can be expressed as:

Sunspot prediction results
The results of the model solution are shown in Figure 3:  Analyzing this in conjunction with the sunspot numbers shown in Figure 4, the solution is that the next solar cycle will begin in about 2031 and end in about 2042.

Modeling the relationship between time and sunspot number
In this paper, we can build a regression analysis model on the relationship between time and sunspot number, and quantify the relationship between time and sunspot number through regression analysis.When the time is determined, the sunspot number can be obtained, and the solar activity can be further inferred.We can express the relationship as: where  is the true value;  denotes the ith group of data; f(x 1 , x 2 , ⋯ , x j , θ 1 , θ 2 , ⋯ , θ p ) is the multivariate nonlinear function, which denotes the deterministic part; x 1 , x 2 , ⋯ , x j is the independent variable; θ 1 , θ 2 , ⋯ , θ p is the unknown model parameter of the multivariate nonlinear function; σ i ε is the stochastic part, ε is the random variable obeying N (0, 1) distribution; σ i is the standard deviation of the random distribution of the ith set of data.
To observe the data distribution, a line graph is plotted as shown in Figure 5: At present, most of the nonlinear regression models rely on empirical or experimental methods to select the regression model, but the empirical method will bring the problem of large errors, and the experimental method is time-consuming, but based on the experimental results can be better to select the correct model.Therefore, this paper determines the suitable regression model for the relationship through the experimental method:

Model solving based on differential evolutionary algorithm
According to the above regression model, we need to determine more parameters totaling 4. To solve this kind of multi-parameter optimization problem, we can generally use the gradient descent method or genetic algorithm.However, because the gradient descent method often easily falls into the local optimal solution, resulting in a large deviation from the final result, while the genetic algorithm's mutation operation is to try to find a better choice by generating a new solution, when it comes to the later stages of the optimization, the entire population may fall into the local optimum.At this time, the solution needs to be able to run out of the local optimal circle, and "ineffective" mutation can not achieve the purpose.Therefore, in this paper, we choose the differential evolutionary algorithm, which can better solve the global optimal problem, to optimize the solution of  1 ,  2 , ⋯ ,  4 .The following are the solution steps: Population initialization The population size M is chosen as 100, and M individuals are randomly and uniformly generated in the solution space.
In the g-th iteration, three individuals  1 (),  2 (),  2 () are randomly selected from the population with 1 ≠ 2 ≠ 3 ≠ , generating a vector of variation: where  2,3 () =  2 () −  3 () is the difference vector and  is the scaling factor.The three randomly selected individuals in the variance operator are ranked from best to worst to obtain , , , corresponding to the fitness , , , the variance operator reads: Also, the value of F varies adaptively according to the two individuals generating the difference

Hybrid modeling results
The results of the solution are shown in Figure 7: From Figure 8, it can be seen that the maximum occurs in April 2034, which corresponds to a black volume of 100.5.

Conclusions
The aim of this study is to explore the prediction of influencing factors based on the adaptive ARIMA-BP neural network model and the prediction of solar activity based on the adaptive multiple nonlinear regression-BP neural network model.Through the establishment of ARIMA and BP neural network models as well as the prediction model construction of the GABP neural network, we successfully established the adaptive hybrid ARIMA-BP neural network model, which provides a new solution for sunspot prediction.
In Chapter 2, we delve into the ARIMA model building and BP neural network model building.The construction of these models lays a solid foundation for the subsequent prediction models.Through the construction of the prediction model based on the GABP neural network, we combined the neural network technology with the prediction model and achieved satisfactory results.Finally, we built the adaptive hybrid ARIMA-BP neural network model, which provides more accurate prediction results for sunspot prediction.
Chapter 3 focuses on solar activity prediction based on an adaptive multiple nonlinear regression-BP neural network model.We first modeled the relationship between time and the number of sunspots and applied a differential evolutionary algorithm to solve the model.Subsequently, we constructed a BP neural network prediction model and performed hybrid model solving.The successful completion of these steps provides a new perspective and method for solar activity prediction.
Through this study, we not only explored the prediction model of solar activity in depth but also successfully combined the traditional ARIMA model and the BP neural network model, which brought new ideas and methods for sunspot prediction and solar activity prediction.Our results provide useful references and insights for research and practice in related fields and also point out new directions for future research.
In summary, this study has achieved useful results in the field of solar activity prediction and provided new ideas and methods for the improvement and optimization of prediction models.We are satisfied with the research results of the adaptive ARIMA-BP neural network model and adaptive multiple nonlinear regression-BP neural network model, and we are looking forward to future indepth exploration in this field.

Figure 3 :
Figure 3: Sunspot predictions will resultFrom Figure3, it can be found that the algorithm designed in this paper recognizes the periodic fluctuations better with better performance.

Figure 4 :
Figure 4: Total number of sunspots per year

Figure 5 :
Figure 5: Total Number of sunspots per month

Figure 6 :
Figure 6: Fitting of regression model results based on differential evolutionary algorithm solution In Figure 6, the blue line shows the predicted values and the red line shows the fitted values.From the figure, we can analyze the model results solved by this algorithm.

Figure 7 :
Figure 7: Monthly sunspot totals Draw a localized map for analysis: The proposed XGBoost-DL obtains the best predictive performance in the comparison (RMSE and MAE) and outperforms the best non-deep learning model SARIMA (RMS) and MAE), outperforming the best non-deep learning model SARIMA (RMSE) The best deep learning model, Informer (RMSE and MAE) and MAE), the best deep learning model Informer (RMSE) and NASA's predictions (RMSE) and MAE).Our XGBoost-DL predicts a peak sunspot number of 133.47 in May 2025 for solar cycle 25 and 164.62 in November 2035 for solar cycle 26, which is similar to NASA's predictions of 137.7 in October 2024 and 161.2 in December 2034 Tabassum et al.
Algorithm: IGABP Input: training set independent variable   , training set dependent variable   , test set independent variable   Output: test set dependent variable  ̂ //Data normalization //′  is the normalized   , ′  is the normalized   // x_maxmin and y_maxmin are normalization information, used for back-normalization.[′  ,x_maxmin]=mapminmax(  ); // mapminmax is the min-max normalization function [′  ,y_maxmin]=mapminmax(  ); // Parameter definition.//Parameter definition Set BP neural network parameters: number of neurons in the hidden layer  ℎ Set the parameters of the genetic algorithm: iteration , crossover rate   , variation rate   .Randomly initialize population   // Genetic algorithm part  =   for  = 1 →  do // iterate over the population crossover mutation for  = 1 → (, 1) do //traverse each individual in the current generation Direct computation of adaptation by forward propagation using activation functions end Selection to obtain new populations of  end // BP neural network  ̂ =[ ] for  = 1 → (, 1) do // traverse each individual Decoding of popu(p,:) Initialize the weights and thresholds of the BP neural network using the values of popu(p,:) Run the BP neural network and calculate the predicted value ′  ̂ = [ ̂ ; ′] end Then, in determining the weights and thresholds of the network, the coding structure can be designed as: 11  12  21  22  31  32