A method of electricity consumption behaviour clustering and pricing packages based on data mining

In this paper, under the background of the reform of electricity sales side, a method of electricity consumption behaviour clustering and pricing packages based on data mining is proposed. Firstly, a distributed clustering framework combining DTW k-medoids algorithm and CFSFDP algorithm is proposed. Secondly, typical load curves of local data are extracted under the framework to construct local model. Then, quadratic clustering analysis is carried out for the local model results, and the global typical load curve is obtained to construct the global model. Finally, recommend the most suitable electricity sales plan to the target users. The experimental result shows that the subdivision of electricity consumption behavior can realize the effective personalized electricity package recommendation service for users and improve the power supply service quality for power companies and provide technical support for improving the operation efficiency.


Introduction
More than 500 million smart meters have been installed in China, and data collected every 15 minutes can add up to a terabyte of data a day. The surge of electricity consumption data of massive residential users has posed challenges to data storage, communication and analysis. On the one hand, electricity consumption data is collected and distributed in different locations, and users' electricity consumption data is collected and stored in different substations to which users belong. It is very expensive and time-consuming to transfer the whole data from each distributed site to a central site. On the other hand, the analysis and clustering of large data sets collected from each distributed site requires significant time and memory overhead. At present, there have been a lot of parallel clustering studies on the application of big data [1][2]. For these algorithms, the entire data set should be located in the same data center and then assigned to different clients, but it does not match the reality of power consumption data collection and storage. In addition, some completely distributed clustering algorithms are proposed to solve these problems by aggregating the information of local data and sending it to the central site for central analysis.
At present, there are many researches on power load clustering for users, but few of them are the application scenarios of providing value-added services to power companies. At home and abroad, the research on the electricity side mainly focuses on the reform of the electricity market, the construction of the electricity retail and wholesale market, and the bidding system. Literature [3][4] constructed the model of users' choice of electricity selling companies through psychological methods, and studied how to guide users to actively adjust their choice behavior through electricity price, so as to achieve a win-win situation between users and electricity selling companies. Power companies mine the massive data of users [5][6][7] and understand users' electricity consumption behavior, which is a necessary way to realize package push [8].
In summary, this paper proposes a resident user clustering and package recommendation method based on data mining for the selling power companies under the application background of the reform of the selling power side. Different electricity price packages are pushed to different types of users to save the electricity cost, so as to truly realize differentiated and refined targeted service of electricity selling companies.

DDKC
The algorithm is mainly divided into the following three steps: first, the local data scattered in different regions are locally clustered, and the local model is constructed; Secondly, the results of local clustering are further clustered and combined to obtain the results of global clustering. Finally, according to the result of global clustering the attribution of local data is adjusted.

2.1.Local clustering based on DTW k-medoids
DTW is a technique for finding the best alignment between two time-dependent sequences. In order to compare two sequences X and Y of length N, the total cost c p (X,Y) of a curved path p between X and Y is defined as follows, and the distance is calculated by The best warping path * p is the path with the lowest cost of all possible paths. The DTW distance between X and Y is defined as the total cost of DTW(X,Y) : ,Given a set of family load curve L and the number of clusters K, the goal of the DTW based clustering algorithm is to find an allocation Ck for each cluster K, and minimize WC in the cluster and cluster prototype matrix μ K , as defined below: In the allocation step, we assign each load curve to the nearest cluster based on DTW(X,Y). After the find all load curve of the cluster members, we update the cluster prototypes, * k µ is when XϵC K the minimum DTW(X, k µ ),this algorithm is equivalent to k-medoids clustering using DTW as distance measurement.

2.2.Global clustering based on modified CFSFDP
CFSFDP clustering algorithm is a kind of clustering algorithm based on density, due to its simple parameters setting, beautiful form and is widely used in a short period of time, its core idea is to use the relative distance i δ and local density i ρ to determine the density of the center, and then to other clustering objects on a distribution is completed.Each clustering object of the original CFSFDP algorithm is equal. In order to fully consider the typicality of each load curve in the calculation of density, the j ω is introduced as follows: j ω is the weight of each generation of clusters in M i , is the weight of load curve j, which is equal to the load of the class in the local data center.

Data set
The data set used for example analysis in this paper comes from the actual measurement data of Irish smart electricity meters published by SEAI. The experiment was conducted with the electricity consumption data of 1,443 household users for 22 days, and 48 data points were collected for each household user every day.

Package selection
Considering that there are many kinds of electricity price packages for domestic and foreign electricity selling companies, this paper selects two types of electricity price packages at home and abroad for experiments, and designs four kinds of electricity price packages.

3.2.Data processing
In the process of data cleaning, the specific steps of verifying and normalizing each original daily load curve of each local data include: deleting the original daily load curve containing the blank value load collection point.Through data processing, 31202 load curves were obtained.

3.3.Clustering results and analysis
The power load curve after processing is evenly divided into 10 local data, each local data contains about 3000 load curves. Based on the existing DTW clustering algorithm of MATLAB, the example is invoked and rewritten in the local clustering stage. Figure 1   According to the local clustering results, the weighted CFSFDP was used for global clustering, as shown in Figure 2,the final global clustering number M=5.  Figure 2 shows the curve of all eventually find the typical electricity clustering model, it can be seen that residents users of electricity mode are diverse, the next will be based on the clustering results and the analysis of the power package together for each subclass users choose pricing packages.

Preliminary analysis of user patterns
This paper USES the concept of information entropy to measure the stability of user behavior. For a user n, assume that there are 5 possible values of load data L in a certain period of time, C and the probability of each value is 1 P , 2 P , 3 P , 4 P and 5 P , then the entropy value n S of load data L in that period can be defined as: The entropy value reflects the fluctuation degree of the curve, the higher the entropy value, the greater the curve fluctuation degree, the entropy histogram of all resident users is shown in figure 3. It can be seen that the entropy value of different types of users is different. Therefore, this paper will analyze the change degree of power consumption behavior of users according to the entropy value of users,If a family's entropy quantile over 65%, it can be classified as a variable family, otherwise is classified as a stable family.

Recommendation of package
According to the monthly cumulative consumption of various users, the monthly cumulative consumption of users will be respectively corresponding to the electricity price packages of each grade in the packages 1-3. Therefore, this paper will extract the monthly cumulative consumption month Q of users as the characteristic attribute of the second clustering of users in the variable electricity consumption type.
Since the number of classified users has been greatly reduced compared with the initial data set, and there is only one characteristic quantity month Q .For electricity user patterns, considering the type 1 and type 2 of consumer behavior had no obvious electricity peaks and troughs, therefore the package is recommended according to the accumulated monthly electricity consumption month Q ； Other types will be recommended as the most suitable package for users of this subclass according to the rate type with the least overlap between the typical load curve and the peak of electricity consumption curve and the peak of electricity price package，the final recommendation results are shown in the table:  As can be seen from the table that users of all subclasses are recommended the time-of-use price packages that are out of position between the high electricity price period and the peak period of their own electricity consumption,we can according to the corresponding relationship between the total monthly electricity consumption of users and the package to choose the package which price is lowest when the electricity price is calculated.

Conclusion
In this paper, under the background of the reform of electricity sales, through the massive amounts of electricity, data mining is constructed under the electricity market reform residential user behavior analysis model, this paper proposes a distributed clustering algorithm DDKC, and on this basis, the behavior of electricity to users in the subdivision, according to the power mode, the classification of a residential user power package recommended method is given. It is helpful to optimize the energy structure of residents and ensure the stability of power system load by analyzing and studying the power consumption behavior of users to recommend personalized packages.