Motor Imagery EEG Classification using Wavelet Common Spatial Boosting Pattern

Feature Extraction is one of the most important steps in brain-computer interface (BCI) systems. In particular, the common spatial patterns (CSP) is one of the most successful solutions which has been widely used in MI-BCIs. However, studies have reported that the performance of CSP heavily depends on its channels configuration. To the best of our current knowledge, it is not available to obtain the active channels related to brain activities of stroke patients in advance. Hence, we usually set a relatively broad channels or try to select a subject-specific channels when applying CSP to stroke patients. In this paper, we present a novel approach which employs wavelet transform and boosting algorithm to improve accuracy and robustness of the conventional CSP. In our proposed approach, the channel configurations are initially divided into multiple preconditions. Then, the informative features of the predefined channels are obtained using the Wavelet Common Spatial Pattern (W-CSP) algorithm that provided high-temporal-spectral resolution. Eventually, we train weak classifiers on the obtained features and combine these weak classifiers to a weighted combinational model using boosting strategy. Extensive experiments have been performed on datasets from the famous BCI competition III and IV. The results demonstrate its superior performance.


Introduction
Brain-computer interface (BCI) systems aim at providing a direct communication pathway between human brain and external devices [1]. Among assort of brain diffused signals, electroencephalogram (EEG), which is recorded by non-invasive methods and has a low cost, is the most exploited brain signal in BCI studies. Among various methods developed for EEG signals, the common spatial pattern (CSP) has proven to be one of the most effective algorithms [2]. CSP which tries to find spatial filters that maximize the variance of one class, while minimizing the variance of the other class was first used to classify two classes of EEG signals [3]. However, CSP is known to be very sensitive to its operational channel configurations [4]. To the best of our current knowledge, it is not available to obtain the active channels related to brain activities of stroke patients in advance. Hence, we usually set a relatively broad channels or try to select a subject-specific channels when applying CSP to stroke patients.
To address this problem, several approaches have been proposed. Yang et al. [5] proposed a novel method to select the effective EEG channels based on the inconsistencies from multiple classifiers. Chin et al. [6] presented discriminative channel addition (DCA) approach and discriminative channel reduction (DCR) approach to select subject-specific discriminative channels by iteratively adding or removing channels based on the cross-validation classification accuracies. Moreover, several novel approaches, called, common spatio-spectral pattern (CSSP) [4], common sparse spectral spatial pattern (CSSSP) [7], filter bank common spatial pattern (FBCSP) [8] and common spatial-spectral boosting pattern (CSSBP) [9,10] were proposed. These methods simultaneously optimize a spatial filter and a spectral filter to enhance discriminability rates of multichannel EEG.
Although the results present an improvement of the mentioned approaches over CSP, it is still a challenging and open issue to extract optimal features of EEG signals. Since the non-stationary nature of EEG signal makes it impossible to extract robust time-domain feature from the original EEG, the joint highly time-frequency resolution obtained by wavelet transform yields a more reliable candidate for the extraction of potential features in EEG. In this paper, we present an adaptive wavelet common spatial boosting pattern (WCSBP) algorithm, which attempts to model the channel configurations as preconditions and utilize CSP to extract features on the wavelet domain instead of the original EEG signal. Our algorithm produces a set of the most contributed channels groups, which could be served as effective inputs of CSP. We evaluate the performance of our algorithm on datasets from the famous BCI competition III and IV.
The reminder of this paper is organized as follows: A detailed formulation of WCSBP is presented in Section II. Section III briefly describes the experimental arrangement and data acquisition. Section IV details the comparison results among several state-of-the-art algorithms. A brief conclusion is drawn in section V.

Proposed Approach
In this section, we present a detailed description of the WCSBP algorithm including spatial channel selection, feature extraction using the WCSBP algorithm and classification using a combinational model.
For the following development, we first introduce some notations and the key points of our proposed approach. We denote , as the training dataset and as the nth sample with EEG signal matrix and label . The purpose of WCSBP is that under a universal channel set composed of all possible channel subsets , we try to find a subset ⊂ which brings out a combinational model by combining all sub-model learned under condition ∈ and optimize the classifier on the training data [9,10].

Common Spatial Pattern
Our proposed approach is developed based on the classic CSP algorithm [11]. The CSP algorithm tries to optimally discriminate between two classes of EEG data based on simultaneous diagonalization of two covariance matrices [14]. Next, we give a brief description of CSP. As mentioned above, the training data of the nth trial are denoted as , , where is the EEG signal matrix of size * and , ∈ 1,2 is the corresponding label. We use and represent two classes preprocessed EEG matrix.
Firstly, the normalized spatial convariance of the EEG can be calculated by , , where is the transpose of and represents the sum of the diagonal elements of . The averaged normalized convariance and are computed by averaging over all the trials of each group. The composite spatial convariance can be factorized as , where is the matrix of eigenvectors and is the diagonal matrix of eigenvalues. Hence, we can get the whitening transformation matrix P Σ ⁄ . Then, whiten the average convariance matrix as , , where and share common eigenvectors and the sum of corresponding eigenvalues of the two matrices always be one. That is to say, the eigenvectors with the largest eigenvalues for have the smallest eigenvalues for and vice versa. To get the optimal solution for separating variance in two signal matrices, we just need to whiten the eigenvectors corresponding to the largest eigenvalues in and . The projection matrix is represented as , where is only composed of the eigenvectors corresponding to the largest and the smallest eigenvalues of . The rows of are the stationary spatial filters and the columns of are common spatial patterns, which are considered as time-invariant EEG source distribution vectors. With the projection matrix , and can be transformed into uncorrelated components . The feature vectors can be calculated by ∑ , where is the ith row of , is the variance of and . The log transformation approximates the normal distribution of data.

2.2.Wavelet-CSP
The non-stationary nature of EEG signal makes it impossible extract robust time-domain feature from the original EEG. The joint highly time-frequency resolution, which cannot be achieved by either Fast Fourier Transform (FFT) or by Short Time Fourier Transform (STFT), obtained by wavelet transform yields a more reliable candidate for the extraction of potential features in EEG [15]. In the proposed approach, we employ the Mallat algorithm [16] to obtain the multi-resolution wavelet decomposition of EEG.
After the wavelet decomposition, we directly perform CSP on the wavelet coefficients corresponding to rhythm and rhythm to extract features related to motor imagery.

Channel Selection and Classification
We use to denote the channels group, then each possible channel subset in satisfies| | | |; here we denote |⋅| as the size of the corresponding set. Hence, what we are supposed to do is to find an optimal channel subsets group ⊂ , which serve as the training data of base weak classifiers.
Given an random channel subset 1 from the initial training data pool , we model them into different base weak classifiers. We select EEG data from the training dataset according to . Then, CSP is hired to extract features from the wavelet coefficients corresponding to rhythm and rhythm of the selected channels of and a base weak classifiers , is trained on the extracted features. Through this method, we build a oneto-one relationship between precondition and its corresponding base weak classifier To solve equation 1, we employ a greedy algorithm. Equation 1 can be rewritten as Thus, we can get a simple recursive formula , . Assuming that has been determined, we can transform Equation 2 to Based on the strategy, each local optimal classifiers 1,2, ⋯ , splits the original training dataset into two part , : and , : . To obtain the next channel subset, we select , | , ∈ ∩ as and insert these 1 duplicated signals into to generate a new training data pool . Hence, we can get kth training data and train kth base weak classifier. Eventually, a combinational model will be achieved.

Data Acquisition
Dataset I was dataset IVa from BCI competition III which was recorded from five healthy subjects (labeled with 'aa', 'al', 'av', 'aw', 'ay' respectively) using BrainAmp amplifiers and a 128 channel Ag/AgCl electrode cap from ECI [17]. Following 3.5 s visual cues, the subjects performed motor imagery tasks: (L) left hand, (R) right hand and (F) right foot; nevertheless, only 280 trials of classes right and foot for one subject were available to the competition. Only 118 EEG channels were measured at positions of the extended international 10/20-system. Then, the EEG signals were band-pass filtered between 0.5 and 200 Hz and down-sampled to 100 Hz.
Dataset II was dataset IIa from BCI competition IV which was provided by C. Brunner, R. Leeb, G. R. Müller-Putz, G. Pfurtscheller, and A. Schlögl from Graz (Austria) and recorded from nine subjects (labeled with A01-A09 respectively) using 22 Ag/AgCl electrodes [11]. The dataset consisted 4 classes of motor imagery EEG measurements, namely, (L) left hand, (R) right hand, (F) feet and (T) tongue. Two sessions on different days, one for training and the other for evaluation, were recorded for each subject. Each session consisted of 6 runs separated by short break. One run was comprised of 48 trials, yielding a total of 288 trials per session. The signals were sampled with 250 Hz and band-pass filtered between 0.5 Hz and 100 Hz.

Result
We ext according Table1 presents the performance in terms of kappa values of all the approaches. Noting that the optimal wavelet basis and the optimal feature dimensionality of every approach were determined based on the training performance. Our algorithm shows superior performance over other approaches and obtains highest kappa value. With a closer look, we could observe that the proposed algorithm obtained greatly improvement even for the subjects with poor CSP accuracies, e.g., A05 and A06. Due to the influence of different levels of noise, all the methods obtained significant difference classification accuracies for all the subjects. Hence, there are great potential for improvement for subject A05 compared to A03 and A08.

Conclusions
This paper proposed a method, called WCSBP, to improve the efficiency and robustness of the classic CSP by combining wavelet transform and boosting algorithm. In this proposed approach, we utilized a stochastic boosting strategy to select channel subsets and trained the base weak classifiers on the wavelet coefficients of the selected channels EEG data. The most discriminatory channel set groups related to brain activities were selected and could be treated as effective directives for CSP. Hence, we got an optimal combinational classifier by combining these base weak classifiers. The main advantage of WCSBP was that WCSBP directly performs the classic CSP algorithm on the wavelet coefficients, which produce more robust CSP projection matrix, instead of the original EEG, due to the highly time-frequency properties of wavelet transform. We compared the performance of our approach with the conventional CSP and other CSP-based algorithms on datasets from the famous BCI competition III and IV. The results demonstrated its superior classification accuracy and robustness.