Education, Science, Technology, Innovation and Life
Open Access
Sign In

An Industry Classification Model of Small and Medium-sized Enterprises based on TF-IDF Characteristics

Download as PDF

DOI: 10.23977/icamei.2019.047

Author(s)

Chen Jiahao, Zhang Jiayi

Corresponding Author

Chen Jiahao

ABSTRACT

This paper selects the data of the national SME Information Disclosure System, uses the TensorFlow in Python to establish the corresponding learning framework, according to its business scope to carry on the corresponding classification. The Jieba participle in Python is first used to remove extraneous words from the business scope of the enterprise. Secondly, using the simple Bayesian text classification model, using Chi as the basis of feature selection, the multi-dimensional characteristics of each type of business scope are selected and re-weighed. After that, the VSM model is constructed for each business scope, which classifies it according to probability. Then, XG-boost is used to encode all the words one-hot, the tree-based model XG-boost is used to make decisions on the processing capacity of tabular data, and prune categories below the threshold. Then, the convolution neural network is used to encode the vocabulary, the lexical annotation is added to the participle, the Gensim training word vector is used, then the cosine similarity is used to calculate, and the classification results are finally obtained.

KEYWORDS

Text classification model, Bayesian model, VSM model, Xgboost, convolution neural network

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.