Education, Science, Technology, Innovation and Life
Open Access
Sign In

Text quantization model based on TF-IDF and NLTK toolbox

Download as PDF

DOI: 10.23977/ESAC2020041


Yidong Hu, Haoran Hua, Jinzhao Song, Huan Zhang, Chuheng Sun

Corresponding Author

Yidong Hu


With the development of Internet technology, more and more people choose online shopping, resulting in a large number of online product reviews. These online reviews convey a wealth of information. Our task is to help Sunshine analyze the online reviews and star ratings of customers of its products, and obtain different measurement methods and customer preferences for different products with different requirements. We performed correlation analysis on reviews, helpfulness ratings, and star ratings. We perform data cleaning based on the strength of the correlation between products, which reduces the sample size. Then, in order to quantify customer reviews, we used TF-IDF to mine high-frequency keywords for different product reviews. After manual screening and expansion, keywords are divided into different topics. Then we use the NLTK toolbox to assign different scores and weights to these topics. Based on these scores and weights, we get a quantitative score of customer online reviews. We also verified the reliability of our scoring criteria by analyzing the correlation between the ratings of online reviews and star ratings. Secondly, we fit the functional relationship between the time of different products and the quantitative score of the text. By further analyzing these functional relationships through MATLB, we have obtained criteria for measuring potential success and failure.


TF; DIF; NLTK; polynomial fitting

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.