Concrete Slump Prediction Based on Hybrid Optimization XGBoost Algorithm

: In this study, a hybrid optimization XGBoost model was used to predict the slump of concrete. This optimization model combines grid search and particle swarm optimization (PSO) algorithm. The grid search is used to determine the maximum depth and the number of trees in XGBoost, while the particle swarm optimization optimizes other floating-point hyperparameter ranges to improve the predictive accuracy of the model. The factors influencing the slump of concrete include water, cement, fine aggregate, coarse aggregate, and water reducer, which are represented by seven parameters. The model performs excellently in both the training and testing sets, with a coefficient of determination (R2) exceeding 0.97. In conclusion, this study demonstrates that the hybrid optimization of the XGBoost model using grid search and particle swarm optimization algorithm can accurately predict the slump of concrete, which is of significant importance for controlling and optimizing the concrete production process.


Introduction
Concrete slump prediction and mixture ratio optimization are crucial steps in the concrete preparation process, as they directly impact the quality and performance of concrete.Therefore, the development of effective prediction models and optimization strategies to enhance prediction accuracy and batching efficiency has garnered significant attention.By training on existing datasets using such prediction models, the slump can be predicted for different concrete mixture ratios, enabling the production of concrete that is better suited to meet specific requirements.
Numerous scholars have proposed their own methods in this regard.Ji Tao et al. proposed an artificial neural network (ANN)-based model for predicting concrete strength and slump.The calculation models for average paste thickness and equivalent water-cement ratio can be obtained by reverse extrapolating the two prediction models [1] .Yeh et al. simulated the slump of self-consolidating concrete (SCC) using an artificial neural network and validated the developed model through response tracking plots.Their study explored the complex nonlinear relationship between concrete components and slump behavior, concluding that response tracking plots can be used for this purpose [2] .Moayedi et al. utilized the ant lion optimizer (ALO) to fine-tune neural networks in the field of concrete slump prediction, and their model performed well in approximating concrete slump [3] .Hamed Safayenikoo et al. employed vortex search algorithm (VSA), multi-verse optimizer (MVO), and shuffled complex evolution algorithm (SCE) to optimize the configuration of a multi-layer perceptron (MLP) neural network, achieving a 33% reduction in prediction error [4].
To address this problem, we selected multiple models such as XGBoost and random forest to predict and compare their performance using evaluation metrics such as coefficient of determination and root mean square error.These machine learning models have been widely applied in various prediction problems and have demonstrated superior predictive capabilities in many practical applications.However, there is still room for improvement in their application to mixture ratio optimization.After conducting multiple experiments, we found that XGBoost outperformed the random forest model in terms of accuracy, but there was still room for improvement.To further enhance the predictive accuracy of the model, we introduced a hybrid algorithm combining search algorithms and particle swarm optimization.
This study considers the key factors in concrete mixture ratios, including water, cement, fine aggregate, coarse aggregate, slag, fly ash, and water reducer, which have significant influence on the accuracy and practicality of the prediction model.Comparative experiments revealed that the combination of XGBoost model and hybrid algorithm optimization exhibited significant advantages over other methods in terms of both prediction accuracy and mixture ratio optimization in concrete.
The aim of this paper is to compare multiple models through experimental analysis and identify the most effective model, ultimately proving the superiority of XGBoost.Finally, by combining XGBoost with the hybrid algorithm, we achieve precise prediction of concrete slump and optimization of the mixture ratio to enhance the quality and performance of concrete.The goal is to provide a more accurate and optimized strategy for concrete preparation, with the hope of widespread application in engineering.

Data Collection
The data for this experiment is obtained from reference [5] , which consists of over 1000 data sets.Each data set includes 7 concrete mixture ratios and the corresponding slump values of the concrete produced using those ratios.The 7 components of the mixture are water, cement, fine aggregate, coarse aggregate, slag, fly ash, and water reducer.However, a significant portion of the data has missing values for the slump.Therefore, the data sets with missing slump values were discarded, and we retained the remaining 295 data sets with non-empty slump values.The sample data is shown in Table 1.

Statistical Analysis of Data
The 295 data sets were subjected to statistical analysis to calculate the mean, standard deviation, minimum value, and maximum value for each variable, as shown in Table 1.Additionally, the Pearson correlation coefficients were calculated to explore the linear relationships between variables, as shown in Table 2 and Figure 1.
There is a significant negative correlation between the variables of water and water reducer.Increasing the amount of water tends to decrease the amount of water reducer used, and vice versa.The correlation coefficient between coarse aggregate and fine aggregate is -0.702177, indicating a significant negative correlation.Increasing the proportion of coarse aggregate leads to a decrease in the proportion of fine aggregate, and vice versa.This is expected since the total aggregate proportion is fixed.The correlations between other variables are relatively low, which may indicate weak associations or non-linear relationships that are not accurately captured by the Pearson correlation coefficient.
There is a noticeable positive correlation between slump and water reducer.Increasing the dosage of water reducer leads to an increase in the slump of the concrete.This could be attributed to the fact that water reducers help improve the workability of the concrete [6] .

Overview of XGBoost Model
XGBoost is an efficient machine learning algorithm based on gradient boosting that is designed to tackle large-scale and high-performance machine learning problems.It was developed by Tianqi Chen and his team at the University of Washington in 2014 and has since become an open-source project with widespread applications and high recognition.XGBoost offers the following key features: excellent predictive performance, parallel processing capability, support for various model forms, built-in model validation and early stopping mechanisms, powerful regularization, and the ability to handle sparse data and missing values [7] .
XGBoost is an additive model composed of k base models.Assuming that the tree model to be trained in the t-th iteration is denoted as , we have: Where y ˆ is the predicted result of sample i after the t-th iteration, and is the predicted result of the previous t-1 trees.The objective function of XGBoost can be formulated as: The , γ penalty coefficient for leaf nodes, T as the number of leaf nodes, ω as the leaf weights, and λ as the weight penalty coefficient.
The prediction of the t-th model for the i-th sample, i x , is given by: represents the predicted value given by the (t-1)th step model and is a known constant, and is the prediction of the new model to be added in this step.The objective function can be written as: ) 1 ( 1 Where C is a constant.Next, we need to find an   i t x f that minimizes the value of the objective function.Therefore, we approximate the objective function by performing a second-order Taylor expansion, resulting in an approximation of the objective function as: Where is a constant and does not affect the optimization of the function.Therefore, we can remove all constant terms, resulting in the objective function:

Grid Search Method
In this study, the grid search method was used to optimize the hyperparameters of the XGBoost model.Specifically, the grid search method was employed to determine the maximum depth of the decision trees and the number of trees in the XGBoost model.The grid search method systematically explores multiple combinations of parameters and identifies the optimal parameter combination through cross-validation.In this process, a range and step size need to be specified for each parameter, and all possible parameter combinations are generated [8] .In the XGBoost model, since the maximum depth of the decision trees and the number of trees are integers, it is relatively easy to find the optimal solution within the specified parameter space using the grid search method.However, due to the potentially large search space for other floating-point parameters in the XGBoost model, a particle swarm optimization algorithm will be used for their optimization in subsequent steps.This hybrid approach of using both the grid search method and the particle swarm optimization algorithm ensures both model performance optimization and computational efficiency.

Particle Swarm Optimization Algorithm
Particle Swarm Optimization (PSO) is an evolutionary computation technique.This method simulates the foraging behavior of a flock of birds.In the search space, each "bird" (referred to as a "particle" or an "individual") has a fitness value determined by a fitness function.Each particle knows its own best position (i.e., the position with the highest fitness it has found) and the globally best position.In each iteration, the particles update their velocities and positions to move towards their own best position and the globally best position [9] .The optimization process of the particle swarm is illustrated in Figure 2.

Evaluation Metrics
In the experimental process, the concrete-related dataset was first read and processed.The dataset includes influencing factors such as water, cement, fine aggregate, coarse aggregate, and water reducer, as well as the slump value of concrete as the output of the model.Then, the dataset was divided into a training set and a test set.Due to the limited number of data in this study (295 samples), the training set was set to 80% of the total, with 236 samples, and the test set was set to 20% of the total, with 59 samples.
Three evaluation metrics were used to assess the model: coefficient of determination (R2), mean squared error (MSE), and mean absolute error (MAE).These metrics were chosen because they can measure different aspects of the model's prediction performance.R2 measures the accuracy of the model's predictions, while MSE and MAE measure the magnitude of the prediction errors.Together, these three metrics provide a comprehensive evaluation of the model.
The calculation formulas for these evaluation metrics are as follows:

Based on the grid search method for hyperparameter optimization in the XGBoost algorithm
The specific approach to determining the values of two integer hyperparameters, namely the maximum tree depth (max_depth) and the number of trees (n_estimators), in the XGBoost algorithm using grid search is to use the coefficient of determination (R2) as the evaluation metric.The XGBoost model for slump prediction is built using the training dataset.As shown in Figure 3 and Figure 4, the optimal values for max_depth and n_estimators are obtained when the R2 value is maximized during the training process.According to the graph, it can be observed that when the maximum tree depth (max_depth) is set to 6 during the training process, the model achieves the highest R2 value of 0.944.As the value of max_depth increases, the R2 values remain below 0.944 and fluctuate around 0.935.Additionally, when the number of trees (n_estimators) reaches 100, the R2 value starts to stabilize.The maximum R2 value of 0.9441 is achieved when n_estimators is set to 123.By using the grid search method, the optimal values of max_depth and n_estimators in the XGBoost model are determined as 6 and 123, respectively.
Similarly, the grid search method is employed to determine the value ranges for three floating-point hyperparameters in the XGBoost model: learning rate (learning_rate), the minimum loss reduction required to make a further partition on a leaf node (gamma), and the subsample ratio of the training instances (subsample).The specified value ranges for these parameters are presented

Conclusion
In this study, a hybrid optimized XGBoost model was developed to predict the slump of concrete, combining grid search and particle swarm optimization (PSO) algorithms.The influence of seven factors including water, cement, fine aggregate, coarse aggregate, and admixture was thoroughly investigated.The experimental results demonstrated that the proposed model exhibited excellent performance on both the training and test sets, with a coefficient of determination (R2) exceeding 0.97.
Compared to other commonly used prediction models such as XGBoost, Random Forest (RF), LightGBM, and Gradient Boosting Decision Trees (GBDT), the hybrid optimized XGBoost model achieved higher prediction accuracy on the test set, with higher R2 scores and significantly reduced mean squared error (MSE) and mean absolute error (MAE).The following advantages of the hybrid optimized XGBoost model can be observed: Integration of multiple optimization methods: The hybrid optimized XGBoost model combines grid search and particle swarm optimization algorithms.Grid search systematically explores the hyperparameter combinations of the algorithm to find the best model configuration, while particle swarm optimization adjusts model parameters in an adaptive manner to improve performance.By integrating multiple optimization methods, the hybrid optimized XGBoost model can fully leverage the advantages of each method and enhance prediction performance.
Higher prediction accuracy: As shown in Table 4, the hybrid optimized XGBoost model achieved the best performance in terms of R2, MSE, and MAE.It can better fit the data and capture complex relationships within the data, resulting in more accurate predictions.Compared to other algorithms, the hybrid optimized XGBoost model provides more accurate predictions of concrete slump.
Efficient feature learning and ensemble capability: XGBoost, as the base model, is an improved version of gradient boosting algorithm, with powerful feature learning and ensemble capabilities.It can automatically learn the importance of features and perform feature selection to extract the most informative ones.By ensembling predictions from multiple base models, XGBoost can reduce model variance and improve generalization ability.
In summary, the hybrid optimized XGBoost model, combining grid search and particle swarm optimization algorithms, provides a powerful and accurate tool for high-precision prediction of concrete slump.The findings of this study are not only of practical significance for the control and optimization of concrete production processes but also provide strong support and inspiration for related research in the field.

Figure 1 :
Figure 1: Correlation Coefficients between Variables and Slump objective function consists of two components: the loss function and regularization.The loss function, discrepancy between the true value i y and the predicted value i y ˆ for each sample, where n represents the number of samples.The regularization term, complexities across all trees and serves as a regularization term to prevent overfitting.

Figure 3 :
Figure 3: Plot of the relationship between R2 and the max_depth parameter.

Figure 4 :
Figure 4: Plot of the relationship between R2 and the n_estimators parameter.

Table 1 :
Construction of a dataset for predicting porosity models.

Table 2 :
Statistical analysis of the data

Table 4 :
Predictive Performance of Five Machine Learning Methods on the Test Set.