Research on Campus Hot Topic Detection Based on LDA Topic Model
Download as PDF
DOI: 10.23977/csic.2018.0940
Author(s)
Xiujuan Yi, Weidong Zhu
Corresponding Author
Xiujuan Yi
ABSTRACT
The hot topics in colleges and universities network are detected by analyzing students’ Internet content. The students’ Internet contents have the characteristics of varying lengths, scattered topics and distorted information. The traditional VSM model calculates the weight forming vector of features according to the word frequency statistics and ignores the implicit content in the text. Therefore, the LDA (Latent Dirichlet Allocation) topic model that can identify the hidden topics in the document is used to reconstruct the model and detect topics. When preprocessing text, the text is filtered based on TF-IDF after word segmentation, and then the text is modeled by LDA. After LDA clustering, the k-means algorithm based on the JS (Jensen-Shannon) distance function is used to cluster the documents according to the probability distribution of the subject two times to get the students' Internet theme distribution.
KEYWORDS
Text Modeling, Lda Topic Model, Vsm Model, Tf-Idf, Js, K-Means