Forecasting The Chinese Stock Market Volatility with ETF Volatility Index

: In this paper, we forecast the realized volatility of the Shanghai Composite Index using the heterogeneous autoregressive model for realized volatility (HAR-RV) and its various extensions. Then we take a new variable named Chinese ETF volatility index into consideration, in order to compare the predictive ability between conventional models and the corresponding extended models. Our empirical results suggest that the new variable shows a significantly positive impact on the future volatility of Chinese stock market, and the extended models generate superior out-of-sample forecasting performance than the original models based on the model confidence set (MCS) test. Additionally, various sample periods, alternative volatility estimators, and alternative evaluation methods confirm the robustness of our results.


Introduction
Given that stock market volatility is central to asset pricing, asset allocation, and risk management, it is crucial to forecast volatility more accurately. Numerous studies have documented that (intra-daily) high-frequency data are useful for forecasting volatility. High-frequency volatility models measure the so-called realized volatility (RV), a concept that was pioneered by Andersen and Bollerslev (1998) [1]. Among the subsequent realized volatility models, the heterogeneous autoregressive RV (HAR-RV) model proposed by Corsi (2009) is one of the most popular [2]. Although the specification of HAR-RV is simple, it can capture ''stylized facts" in financial market volatility such as long memory and multi-scaling behavior. HAR-RV has become the standard benchmark for analyzing and forecasting financial volatility dynamics (see,e.g., Yaojie Zhang et al.,2019; Yaojie Zhang et al.,2020) [3][4].
Our motivation is straightforward. The ETF volatility are documented to have strong links with stock markets volatility and return, and ETF has predictive power (see, e.g., Jose [5][6][7][8]. Mike Buckle et al. (2018) proposes that ETFs can lead the price moves, and have adjusted prices actively to pre-market information and activities [9]. It is verified by Sanjeev Bhojraj et.al (2020) that exchange-traded funds (ETFs) play in the transfer of information across firms around earnings announcements [10]. Marie-Eve Lachance (2021) examines that exchange-traded funds' (ETFs) unusually high overnight returns are distorted by market microstructure effects. Thus, we can conclude that the impact of ETF volatility will be transmitted to the stock market volatility [11]. Therefore, an increase of ETF volatility index may lead to a corresponding increase of the volatilities for stock markets. Inspired by this, this paper will research the effect of ETF volatility index on realized volatility of stock markets.
There are, however, few studies that forecast Chinese stock market volatility using ETF volatility index. To fill this gap, in this study, we forecast the realized volatility of the Shanghai Composite Index using the HAR-RV model and its extensions. We follow a related study by Wang et al. (2016) and use four of popular HAR-RV-type models summed up by that article [12]. Additionally, we add the variable ETF volatility index, in order to compare the predictive ability between conventional models and the corresponding extended models.
Our empirical results provide several notable findings. First of all, the in-sample estimation results show that our new predictor of ETF volatility index has a significantly positive impact on the realized volatility of the Chinese stock market. Secondly, the model confidence set (MCS) test proposed by Hansen et al. (2011) shows that the extended models generate superior out-of-sample forecasting performance than the original models [13]. Thirdly, we provide several robustness tests. Particularly, we perform the Direction-of-Change (DoC) test suggest by Degiannakis and Filis (2017) to explore the directional accuracy of volatility forecasts [14]. The Pesaran and Timmermann (1992) statistics suggest that the null hypothesis of no directional accuracy is rejected at the 1% significance level for all the forecasting models [15]. More importantly, our HAR-RV-RS-I-VI and HAR-RV-SJ-I-VI models yield more correct DoC rates than the original models, supporting the superiority of the two proposed models in terms of the directional accuracy. In addition, we provide evidence that our results are robust to various sample periods, alternative volatility estimators, and alternative evaluation methods.
The remainder of the paper is organized as follows. Section 2 provides the econometric specifications. Section 3 describes our data. Section 4 presents the empirical results. Section 5 details a series of robustness checks. Finally, Section 6 concludes.

Methodology
In this section, we briefly describe several popular realized volatility models.

Realized volatility measure
In the pioneering work of past scholars, Andersen and Bollerslev (1998) propose using realized volatility (RV) as a proxy for integrated variance [1]. For a specific business day t, the realized volatility can be calculated as the sum of the squared intraday returns: Where , represents the j-th intraday return on day t ,1/M is the given sampling frequency.

Modeling realized volatility
In recent years, the heterogeneous autoregressive realized volatility (HAR-RV) of Corsi (2009) has been the most popular RV model [2]. This model accommodates some of the stylized facts found in financial asset return volatility such as long memory and multi-scaling behavior. The HAR-RV is simple to implement, as it only contains three predictors: including lagged daily realized volatility (RV , ), lagged weekly realized volatility (RV , ), and lagged monthly realized volatility (RV , ). The HAR-RV model specification can be expressed as where −ℎ: .Thus RV , is the average RV from day t-4 to day t and RV , is the average RV from day t-21 to t.
To capture the role of the ''leverage effect" in volatility dynamics, Patton and Sheppard (2015) develop a series of models using signed realized measures [16]. The first model extends the standard HAR-RV by decomposing daily RV into two semi-variances (HAR-RV-RS-I), Where RS − = ∑ =1 , 2 � , < 0� and RS + = ∑ =1 , 2 � , > 0� The second model for capturing the ''leverage effect" contains a signed jump variation and an estimator of the variation caused by the continuous part (bi-power variation) (HAR-RV-SJ-I): The last model for the ''leverage effect" decomposes the role of the positive and negative jumps, which is termed HAR-RV-SJ-II, Where SJ + = SJ I(SJ > 0) and SJ − = SJ I(SJ < 0). Thus, we have four HAR-type volatility equations to model and forecast realized volatility. Meanwhile, in order to compare the forecasting results with and without ETF Volatility Index in the model, our research will this variable to the above four classic models.

Data
To forecast the Chinese stock market volatility, we use CBOE China ETF Volatility Index (VIX) from FRED Economic Research. The Shanghai Composite Index (SSEC) serves as a representative of the Chinese stock market, the realized variance of which is obtained from the Oxford-Man Institute's Quantitative Finance Realized Library. Because Liu et al. (2015) find little evidence that the 5-min RV is outperformed by any other measures from 400 volatility estimators for 31 different financial assets spanning five asset classes [17]. Therefore, we use 5-min RV from the SSEC index.
Our sample period is from April 18, 2011 to June 22, 2021.After matching the days on which all the considered the Chinese stock market trading days have corresponding VIX, we obtain 2392 observations for the stock market. To generate out-of-sample forecasts, we divide the whole sample period into an in-sample estimation period consisting of the first 1000 observations and an out-ofsample evaluation period consisting of the remaining 1392 observations. Figure 1 depicts the evolution of the Chinese stock market's realized volatility and China ETF Volatility Index. Their corresponding descriptive statistics are shown in Table 1. All of the series are right skewed except SJ and SJ-, and they all display positive kurtosis, suggesting they have non-normal distributions. The Jarque-Bera statistic rejects the null hypothesis of a normal distribution for these variables at 1% significant level, further confirming the fat-tailed distribution. Finally, the null hypothesis of a unit root is rejected at the 1% significance level based on the Augmented Dickey-Fuller (ADF) statistics. This evidence suggests that all the time series of these realized variances are stationary and thereby can be employed directly without further transformations.

Empirical results
In this section, we first give the in-sample estimation results of the eight volatility models used in study. Then, we evaluate the out-of-sample forecasting performances of the eight models with and without ETF Volatility Index. Table 2 and table 3 show the OLS estimates of the eight volatility models over the whole sample period along with the t-statistics based on the Newey-West. From the two tables, we can see that the parameter estimates of daily, weekly, and monthly RV in each model are all significant at the 1% level, suggesting strong persistence in the realized volatility dynamics. The coefficients of leverage effect in the HAR-RV-RS-I, HAR-RV-SJ-I, HAR-RV-SJ-II and corresponding extended models added ETF volatility Index are significant, indicating that it plays an important role in the volatility process. Moreover, we examine the null hypothesis that positive and negative semi-variances have equal predictive power for realized volatility (i.e., + = − ) based on the benchmark model and corresponding extended model. The parameter estimates of the signed jumps, , + is significantly positive while , − is not significantly, indicating that only positive (''good") jumps lead to higher future volatility.

In-sample estimation results
Additionally, the regression coefficients of the Chinese ETF volatility Index are positive and significant at the 1% level in the latter four models. It indicates the index has a positive impact on the volatility of Chinese stock market. Taking the adjusted R 2 into consideration, we can find that the value of the adjusted R 2 is more than 68% and the value of which has increased in the four extended models. The overall model fits better.  This table provides the parameter estimation results for the four realized volatility models for the whole sample period from April 18, 2011 through June 22, 2021. The numbers in the parentheses are the t-statistics. The asterisks * , ** and *** denote rejections of null hypothesis at 10%, 5% and 1% significance levels, respectively. Table 3. Estimation results of extended realized volatility models for the whole sample This table provides the parameter estimation results for the four extended realized volatility models for the whole sample period from April 18, 2011 through June 22, 2021. The numbers in the parentheses are the t-statistics. The asterisks * , ** and *** denote rejections of null hypothesis at 10%, 5% and 1% significance levels, respectively.

Out-of-sample forecasting performance
The out-of-sample forecasting results are more important than the in-sample estimation results. Therefore, we generate the out-of-sample forecasts of the Chinese realized volatility based on a rolling estimation window. More specifically, our entire sample is divided into an in-sample portion composed of the first 1000 observations and an out-of-sample portion composed of the remaining 1392 observations. When we obtain each out-of-sample forecast, we should roll forward the estimation sample by adding one new observation and dropping the first one in the previous estimation window.
To quantitatively compare the out-of-sample performance among the forecasting models used by this paper, we apply two popular loss functions of QLIKE and MSE. In particular, Patton (2011) demonstrates that, in terms of the ranking of competing volatility forecasts, QLIKE and MSE are robust to the presence of noise in the volatility proxy [18]. MSE is the mean squared error of realized variance forecasts. Statistically, QLIKE, and MSE can be expressed as follows: Where is the actual RV on day t, � is the RV forecast based on one of the forecasting models, and m and q are the length of in-sample estimation period and out-of-sample evaluation period, respectively.
We assess the statistical significance of differences in forecasting losses using the model confidence set , among others, we use the confidence level of 90% [12][13]19]. This allows us to exclude a model with a p-value smaller than 0.1 from the MCS. In other words, the forecasts of this model are significantly less accurate than the models in the MCS. It is evident that a model with a larger MCS pvalue shows stronger predictive ability. Table 4 reports the out-of-sample forecasting performance of all the models, including the mean value of loss functions and the MCS p-values for the paired comparison of the HAR-RV type models and the extended models added ETF volatility Index. An impressive finding is that only two conventional models consistently appear in the MCS with the confidence level of 90%. Furthermore, the extended models always deliver the largest MCS p-value of 1. This evidence suggests that the method of adding volatility index has significantly better out-of-sample forecasting ability than the other conventional forecasting models in the prediction of the Chinese stock market volatility. This implies that the Chinese ETF volatility Index is a rather efficient variable to improve the forecast accuracy of the Chinese stock market volatility in the context of the country.   [22][23].Therefore, the forecasting window size plays a crucial role in out-of-sample evaluation. Considering this, we additionally consider another two window sizes, where the initial in-sample estimation windows contain 800 and 1200 observations, so that the corresponding out-of-sample length is 1592 and 1192, respectively. It should be noted that both the two forecasting windows considered in this paper have a desirable trade-off between an initial in-sample estimation period that has enough observations to precisely estimate parameters and an outof-sample period that has a relatively long length for forecast evaluation. Table 5 reports the MCS p-values for an alternative forecasting window, in which the length of insample periods is 800 and 1200. Analogously, this evidence suggests that the HAR-RV extended models are more likely to be the best model for Chinese stock markets under alternative window sizes. The extended models always deliver the largest MCS p-value of 1. Although the classic HAR-RV type models illustrate good results, the method of adding volatility index has significantly better out-ofsample forecasting ability than the other conventional forecasting models. In conclusion, the out-ofsample results are robust to various sample periods. Table 5. Out-of-sample performance for different in-sample evaluation periods

Alternative volatility estimator
In view of the unobservable actual volatility, we further consider another prevailing volatility estimator, the realized kernel (RK) proposed by Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008), to re-examine the out-of-sample performance for the eight above-mentioned forecasting models [24]. An appealing property of the realized kernel is that this volatility estimator is robust to market microstructure noise (Barndorff-Nielsen et al., 2008) [24]. Mathematically, the realized kernel for market i on trading day t can be defined as and ( ) is the Parzen kernel function, which is given by It is necessary for H to increase with the sample size in order to consistently estimate the increments of quadratic variation in the presence of noise. We follow precisely the bandwidth choice of H spelt out in Barndorff-Nielsen, Hansen, Lunde, and Shephard (2009), which we refer the reader to for more details [25]. The data of the realized kernel are also available from the Oxford-Man Institute's Realized Library.
We replace the realized volatility with the realized kernel in all the used models and then re-run these regression models to generate the realized kernel forecasts. Table 6 reports the MCS p-values when we use the realized kernel to estimate, forecast, and evaluate the Chinese stock market volatilities. We observe a robust result that the Chinese ETF volatility Index continues to show very powerful predictive ability. All the extended models always deliver the largest MCS p-value of 1 based on the loss functions QLIKE and MSE except one. In summary, the results of the MCS test are robust to alternative volatility estimators.

Alternative evaluation methods
Although the model confidence set (MCS) proposed by Hansen, Lunde, and Nason (2011) is very suitable for this study, we need employ other methods to consolidate our research results [13].
Following Degiannakis and Filis (2017), we employ the Direction-of-Change (DoC) as an additional out-of-sample evaluation criterion [14]. Degiannakis and Filis (2017) state that the DoC is central to the trading strategies of market timing and asset allocation [14]. Specifically, the DoC measures the proportion of forecasts that correctly predict the direction of the volatility movement. We let be a dummy variable that takes the value of one if a model correctly predicts the direction of volatility movement on trading day t, and zero otherwise. Consequently, this dummy variable is given by Further, we define the DoC rate as the proportion of forecasts that correctly predict the direction of the volatility movement. Statistically, the DoC rate is equal to 1/ ∑ + = +1 . To explore the statistical significance ofthe directional accuracy, we use a nonparametric test proposed by Pesaran and Timmermann (1992) to test the null hypothesis that the DoC rate of a forecasting model of interest is less than or equal to the DoC rate of random walk against the alternative hypothesis that the DoC rate of a forecasting model of interest is larger than the DoC rate of random walk [15]. Table 7 reports the DoC results for all the forecasting models. First of all, we reject the null hypothesis of no directional accuracy at the 1% significance level for all the forecasting models, suggesting the success of the HAR-RV model as well as its extended models in the directional prediction. Second and more importantly, the two extended models (HAR-RV-RS-I-VI and HAR-RV-SJ-I-VI) yield larger DoC rates than the corresponding original HAR-RV models. Also, the confidence levels for the higher DoC rates of the extended models are greater than those of the conventional HAR-RV models. On average, the HAR-RV-RS-I-VI model yields the largest DoC rate of 0.6334. In conclusion, the DoC results are basically consistent with the MCS results. The extended models commonly exhibit better predictive ability than the other original models from a directional accuracy perspective. Table 7. Direction-of-Change rates  Pesaran and Timmermann (1992). We consider all the forecasting models used by this paper. Bold and underlined figures highlight instances in which the DoC rate is the larger than the corresponding conventional models. Statistical significance for DoC rate is based on the p-values of the PT statistic. Note that the null hypothesis of no directional accuracy is rejected at the 1% level for all of the reported DoC rates. The entire sample period containing 2392 observations spans from April 18, 2011, to June 22, 2021, while the length of in-sample period is 1000. *** Indicates significance at the 1% level.

Conclusions
In this paper, we predict the Chinese stock market (SSEC) realized volatility based on the HAR-RV framework, which includes four popular HAR-RV-type models and four extended models by adding the variable Chinese ETF volatility index. The in-sample results suggest that the new variable shows a significantly positive impact on the future volatility of Chinese stock market.
Using the MCS test to evaluate out-of-sample forecasting performance, we find that the four extend models exhibit significantly better out-of-sample forecasting performance than the four conventional HAR-RV type models. In addition to this, our results are robust to various sample periods, alternative volatility estimators (i.e., realized variance and realized kernel), and alternative evaluation methods. In terms of directional accuracy, the HAR-RV-RS-I-VI and HAR-RV-SJ-I-VI models generate higher DoC rates, suggesting superior predictive ability relative to the original models. In conclusion, the extended models performs better than classic HAR-RV volatility forecasting models.