Research on Diabetes Prediction Model Based on XGBoost Algorithm
Download as PDF
DOI: 10.23977/icamcs2019.47
Corresponding Author
Wu Hao
ABSTRACT
To explore the role of XGBoost algorithm in predicting the risk of diabetes mellitus. Pima Indian Diabetes data set in UCI machine learning database was selected and 70% of the samples were randomly selected. The plasma glucose concentration, diastolic blood pressure (mm Hg), triceps skinfold thickness (mm), 2-hour serum insulin (mu U/ml), body mass index (kg/m2), diabetic family function value and age were taken as eight factors after pregnancy, oral glucose tolerance test for 2 hours. Independent variable, with diabetes as dependent variable, based on Logistic regression and XGBoost, diabetes prediction models were established respectively. The prediction model is applied to the remaining 30% samples to evaluate the prediction effect of the model with the correct rate. The correct rates of Logistic regression model and XGBoost model were 77% and 83%, respectively. XGboost has better prediction accuracy than traditional Logistic regression.
KEYWORDS
Diabetes, XGBoost, Forecast