Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on New Words Discovery of Weibo Based on SVM and Word Features

Download as PDF

DOI: 10.23977/AICT2020010


Yuanfang Xu

Corresponding Author


In order to effectively identify new words in Weibo corpus, a new word discovery method based on SVM and word features is proposed for the unique text characteristics of Weibo corpus. With the help of the good classification of SVM, firstly, positive and negative samples are extracted from the micro blog corpus and the part of speech tagged training corpus, and then vectorized by combining the characteristics of various words calculated from the training corpus, and then the micro blog new words classification support vector is obtained through the training of SVM. In this paper, the word segmentation and part of speech tagging are performed on the test corpus containing simulated new words, and the candidate new words are selected by combining the proposed constraints and relaxation variables. After vectoring with the characteristics of the words themselves, the candidate new words are used as input and the trained SVM classifier is used for calculation. The results are compared with the threshold values. When the results are less than the threshold values, it is determined as a new micro blog The most suitable kernel function of SVM is selected by comparing the experimental results.


Weibo neologisms; SVM; word features

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.