An Efficient Distributed Database Clustering Algorithm for Big Data Processing
Download as PDF
DOI: 10.23977/iccsc.2017.1012
Author(s)
Qiao SUN, Lan-mei FU, Bu-qiao Deng, Xu-bin Pei, Jia-song SUN
Corresponding Author
Qiao SUN
ABSTRACT
This paper proposes a distributed data clustering technique based on deep neural
network. First, each record in the distributed database is taken as an input vector, and its
characteristics are extracted and input to the input layer of the depth neural network. The
weight of the connection is trained by BP algorithm, and the training of depth neural
network output is realized by adjusting the weight. Finally, the data clustering results are
judged according to the similarity of the current vector corresponding to the output data.
Experimental results based on small-scale distributed systems show that this method has
better test set accuracy than traditional k-means clustering method, and is more suitable for
large-scale data clustering in the distributed environments.
KEYWORDS
Distributed big data processing, Distributed database, Data clustering, Depth
neural network, K-means.