Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on Campus Hot Topic Detection Based on LDA Topic Model

Download as PDF

DOI: 10.23977/csic.2018.0940

Author(s)

Xiujuan Yi, Weidong Zhu

Corresponding Author

Xiujuan Yi

ABSTRACT

The hot topics in colleges and universities network are detected by analyzing students’ Internet content. The students’ Internet contents have the characteristics of varying lengths, scattered topics and distorted information. The traditional VSM model calculates the weight forming vector of features according to the word frequency statistics and ignores the implicit content in the text. Therefore, the LDA (Latent Dirichlet Allocation) topic model that can identify the hidden topics in the document is used to reconstruct the model and detect topics. When preprocessing text, the text is filtered based on TF-IDF after word segmentation, and then the text is modeled by LDA. After LDA clustering, the k-means algorithm based on the JS (Jensen-Shannon) distance function is used to cluster the documents according to the probability distribution of the subject two times to get the students' Internet theme distribution.

KEYWORDS

Text Modeling, Lda Topic Model, Vsm Model, Tf-Idf, Js, K-Means

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.