Bayesian analysis of hydrological time series based on MCMC algorithm

In this paper we consider Bayesian analysis of the possible changes in hydrological time series by Markov chain Monte Carlo (MCMC) algorithm. We consider multiple change-points and various possible situations. The approach of Bayesian stochastic search selection is used for detecting and estimating the number and positions of possible change-point in a piecewise constant model. MCMC algorithm is used to estimate the posterior distributions of parameters. The result of the analysis is applied to the hydrological data sets of the major river net area of Shunde in China and the data set of Nile River. In order to further investigate the trends in each segment of the hydrological data sets, we consider the analysis of change-point regression model via MCMC algorithm.


Introduction
Following the published studies on climate changes, a number of hydrologists have used models, which describe certain types of changes, to represent hydrological time series.When natural surroundings changes abruptly, the hydrological time series may exhibit some trends or jumps.In this case, the statistical characteristics of the sequence may be different before and after some time points known as the change-points.Thus, change-point of hydrological time series reflects certain environmental changes.In most of the previous papers, a given type of change occurring with certainty is assumed, and focus has been put on the characterization of the change-point [1][2].However, the problem of detecting multiple change-points is one of the most challenging problems, since both the number of the change-points and their locations are unknown.Besides the positions of the change-points, the trends of the sequence in different segments are of interest.Regression model can be used to deal with this problem.
In this paper we consider the problem of analyzing the possible changes in hydrological time series by MCMC methods.In section 2, we adopt a piecewise constant multiple change-points Bayesian procedure to analyze the real data sets of the major river net area of Shunde in China and the data set of the Nile River from [3].In order to further investigate the trends in different segments of the time series, we consider the analysis of change-points regression model by MCMC algorithm in section 3. Finally, in section 4, we present some conclusions.

Piecewise constant change-points model
Let { , 1} t y t  be a real stochastic process which is constant between two change-points such that where   is assumed to be a sequence of independent Gaussian random variables such that ( It is convenient for detecting change-points to introduce a random vector The estimation of the change-points instants reduces to the estimation of the set , then the likelihood function is given by In Bayesian inference, the choice of priors is important.We suppose  is a sequence of independent and identically distributed Bernoulli random variables with parameter  .Thus, the prior density of  is 1 ( , ) On the other hand, for a given  , the prior of m is chosen as independent Gaussian distributions be the set of hyper-parameters which are known constants.Using standard distribution theory we obtain the conditional posterior distribution of m from equation (6), where . It means that conditional on ( , ) y  , the ' k m s remains independent and Gaussian.According to Bayes theorem, the joint posterior distribution of the parameter ( , ) Consequently, the parameters ' k m s can be eliminated by integrating out k m from equation (8).In a special case, 2 2 , 1,..., , The conditional posterior distribution of  can be obtained as where ( ; ) is referred to as the energy function, in which The unknown parameter vector  can be estimated from the posterior distribution ( | ; ) f r y  .One of the standard Bayesian estimates is the (marginal) maximum a posteriori (MAP) estimator obtained by maximizing the posterior distribution defined as . Unfortunately, a closed-form expression of the MAP estimator of  cannot be obtained and MCMC algorithm could be used to obtain it.

MCMC methods
The main idea of this algorithm is to generate a Markov chain  

{ , 0}
i i   using Metropolis-Hastings (M-H) algorithm with the invariant distribution ( | ; )   f y   .The M-H algorithm is an iterative procedure.At iteration i, we carry out the following two steps: (1) An admissible new value  is drawn from a proposal kernel with the acceptance probability: , if the proposal kernel is symmetric such that then the acceptance step (2) reduces to: where ln R   , R is drawn from the uniform distribution U(0,1).After a sufficiently long burn-in, the estimator of  is determined by computing the time average of the Markov chain output samples.
The MAP estimator of  is determined by using a simulated annealing (SA) algorithm, which defines a non-homogeneous Markov chain.A decreasing temperature schedule should be introduced in the SA algorithm which can modify the acceptance probability.The proposal kernels are defined as in [4]: (a) The candidate  is drawn independently of the current state ( ) i  from an instrumental distribution defined by       .In this paper, q is chosen as a Bernoulli distribution with parameter  .
(b) A random permutation of {1,..., } n is uniformly drawn.According to this permutation, each component is flipped from 0 to 1 or from 1 to 0.
(c) An actual change-point is randomly selected and a neighborhood of this instant is defined.The change-point instant is moved in its neighborhood and accepted according to the acceptance probability.
The acceptance probability is given by equation (10).The schedule for lowering the temperature is defined by 1 0.99 , where 0 T is greater than a numerical constant depending on the energy The implementation of a MCMC algorithm as described above assumes that the set of hyper-parameters  is known.We estimate  in a maximum likelihood framework by using the Stochastic Approximation version of the EM (SAEM) algorithm given by [5].The algorithm describes as follows: (a) Choose an initial guess (0)   and an initial configuration of change-points instants (0)   .(b) At step j of the iteration: (1) perform one iteration of MCMC using the current value of ( 1)   j   to simulate ( ) j  from ( 1) ( ( ) ), where{ } j  is the step size sequence which decreases to 0.

Data analysis with hydrological data
In this subsection, we apply the MCMC method to analyze fourteen data sets, in which thirteen time series are consisted of the annual maximum values in the major hydrological stations of Shunde in China.The other data set is the annual volume of the Nile River.
The algorithm used in this paper draws vectors ( ) i  according to the distribution ( | ; ) f y   with the MCMC algorithm proposed by Gibbs Sampling.For each vector ( ) i  , the estimated number of segment is The MAP estimator of   is obtained after enough iteration (150000).is displayed in Fig. 2, while Fig. 3 depicts the time series of the hydrological data and the estimate of the vector m.The results are summarized in Table 1 and we summarize some conclusion as follows: (1) Only one change-point was detected for each series.
(2) MCMC method detects one change-point at 1899 for the Nile river series, which agrees with previous studied by [6].
(3) Da Zhou, Fu Zhou He, Huan Ma Yong, Ma Kou, Rong Qi, San Hong Qi, Xiao Bu, Xin Yong have the same change time of 1993.Most of them are located in a developed area.It seems that these eight downstream sections are affected by human activities more obviously.
(4) The change time of Ban Sha Wei,Gan Zhu, Nan Sha, San Shui and Wan Qing Sha are more earlier than those detected by [6] using a grey relational method.
Fig. 1 The posterior distribution of  estimated with 150 000 iterations Fig. 2 The posterior distribution of number K  of segments estimated with 150 000 iterations Fig. 3 The time series of the data sets of Shunde and Nile River, along with the estimate m 

Change-points regression model
In this section, we consider the segmented regression model with k change-points for fitting the observed data given by 1 , with the convention 0 0   , where { , 1} t t   is a sequence of random variables with normal distributions defined by equation ( 2).This model is also called segmented-line regression model.Let and K  be defined as in section 2. Let 2 2 2 1 ( ,..., ) ..., . Then, the likelihood function is given by: Suppose the prior density ( ; )    of  is defined by eq. ( 5) and let , 1,2,...,   be independently distributed as the multivariate normal distribution 0 ( , ) N   , where 0  and  are hyper-parameters.Thus for a given configuration  , the prior density of is defined by: For a given sequence of segment points{ , 1} where 1,..., , and The marginal posterior distribution of  is ) In the special case where 2 2 , 1,..., , the marginal posterior distribution (18) reduces to The estimation of  could be obtained by the MCMC method discussed in section 2. For a given configuration  , the estimation of regression parameters 1 ( ,..., ) can be obtained from the above standard distributions.

The analysis of hydrological data
In this subsection, we use the change-points regression model via MCMC method to analyze the data sets of the annual maximum values in the data set as in section 2. In the MCMC procedure, the hyper-parameter 0  is chosen as the least squares estimator from the regression model y t The hyper-parametric matrix  is chosen as an identity matrix 2 (1,1) I diag  .The sample-based parameter estimates and the posterior probabilities of the fitted models are presented in Table 1.The change-point instant with maximum posterior probability is the same as detected in section 2 for each data set, but the maximum posterior probability may be different from that under the piecewise constant model in section 2. The trends in each segment of the fourteen hydrological data sets can be clearly seen from the estimates of the regression coefficients in Table 1.

Conclusions
This paper studies the problem of change-point analysis of hydrological time series by Bayesian method.We consider various possible situations that may occur.The probabilistic model makes use of a non-observed sequence  , and an M-H algorithm is used for estimating the posterior distribution of  .The advantage of this parameterization is that the dimension of the sequence  is fixed and the hyper-parameters of the model are estimated by the SAEM algorithm.
The method of the paper is applied to analyze the hydrological data sets of the major river net area of Shunde in China and the data set of Nile River.These results agree with those obtained in the previous literatures.The MCMC method is used to estimate also the posterior distribution of the mean sequence.For further investigate the trends in each segment of the time series we use a segmented change-points regression model via MCMC approach.The change-point instant detected with maximum posterior probability is the same as that in the piecewise constant model for each data set.The trends in each segment can be clearly seen from the estimates of the segmented regression models.

Finally, m is 1 .
drawn with the conditional posterior distribution( | , ; )   The posterior distribution of K  , i.e. the probabilities{(