Education, Science, Technology, Innovation and Life
Open Access
Sign In

An Efficient Distributed Database Clustering Algorithm for Big Data Processing

Download as PDF

DOI: 10.23977/iccsc.2017.1012

Author(s)

Qiao SUN, Lan-mei FU, Bu-qiao Deng, Xu-bin Pei, Jia-song SUN

Corresponding Author

Qiao SUN

ABSTRACT

This paper proposes a distributed data clustering technique based on deep neural network. First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network. The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight. Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data. Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.

KEYWORDS

Distributed big data processing, Distributed database, Data clustering, Depth neural network, K-means.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.