Automated Pricing and Replenishment Decisions for Supermarket Fresh Vegetables

: In today's vegetable superstore market, vegetable items have a short shelf life due to their short shelf life. Supermarkets usually replenish the goods on a daily basis based on the historical sales and demand of each item. Therefore, this paper conducts a relevant research on automatic pricing and replenishment decisions for vegetable items based on the measured data of a superstore. First, the trends of different categories under different seasons are plotted. Then, Python linear regression is used to fit the functional relationship equation between sales volume and cost-plus pricing, and an optimization model is constructed with the total daily replenishment as the decision variable and the superstore's revenue as the objective function, so as to derive the predicted sales volume table and pricing strategy table for each category. Finally, the gray prediction model is used to predict and analyze the sales volume of individual items, so as to maximize the superstore's revenue under the premise of trying to meet the market demand for each category of vegetable goods. The model developed in the paper can help superstores predict demand more accurately, make replenishment plans, adjust pricing strategies, and improve market competitiveness.


Introduction
In today's fresh food superstores, the freshness period of all vegetable commodities is relatively short, and the quality deteriorates with the increase of selling time, such as of course not sold, the next day can not be sold.Therefore, superstores usually make pricing and replenishment decisions based on the sales and demand of each commodity [1].
Considering the realities of the situation, such as the variety of vegetables, origin, trading hours, etc., merchants have to make pricing and replenishment decisions accordingly in the absence of certainty.Vegetables are priced using the "cost-plus pricing" method, and superstores usually sell at a discount for goods with shipping losses and poor quality.Reliable market demand analysis is important for replenishment and pricing decisions.From the demand side, there is often a correlation between the sales volume of vegetables and the time of day; from the supply side, the supply of vegetables is more plentiful in certain months, and the limitations of the superstore's sales space make a reasonable sales mix extremely important.
In order to solve the above problems, this paper takes the measured data of a superstore as an example, firstly, to find out the distribution law of the sales volume of each vegetable category and single product and their interrelationship.Then, analyze the relationship between the total sales volume of each vegetable category and the cost-plus pricing, and give the total daily replenishment volume and pricing strategy of each vegetable category in the coming week to maximize the revenue of the superstore.Finally, due to the limited sales space of vegetable items in the superstore, a replenishment plan for new individual items is formulated to maximize the revenue of the superstore under the premise of trying to satisfy the market's demand for vegetable items in each category [2].

Sales volume distribution pattern and correlation analysis
Descriptive statistics of sales by category were analyzed as shown in Table 1.The foliage category is the largest category in terms of sales, while the aquatic rhizome category is the smallest category in terms of sales, so this data will serve as the key measurement basis for our individual product indicators below.
Starting from spring, in terms of the overall trend, the sales of aquatic roots and tubers showed a continuous increase, while other categories had a tendency to decline; in summer, due to the warmer temperatures, the sales of vegetables increased and fruits decreased; in winter, due to the cold weather, the sales of aquatic roots and tubers continued to increase, while other parts of the category were affected by the weather and experienced a decline.By analyzing these data, it can be seen that different seasonal factors have a significant impact on the sales of different categories, thus contributing to the development of the market as a whole.
Pearson is used to calculate the correlation coefficient when both variables are normal continuous variables and they show a linear relationship [3].In statistics, known as the Pearson correlation coefficient, r or Pearson is commonly used in articles to de-measure the correlation between two variables, which has a value given between -1 and 1.In the natural sciences, it is used to measure the degree of association between two variables and to measure the linear relationship of the variables.Pearson's correlation coefficient is calculated as: The closer the correlation coefficient is to 1 or -1, the larger the absolute value of the correlation coefficient is, and the stronger the correlation is the closer the correlation coefficient is to 0, the weaker the correlation is.Through the above correlation analysis, we arrive at the correlation results as shown in Table 1.An analysis of the correlations corresponding to the categories in the above table shows that.Among them, there is a negative correlation between foliage and cauliflower, aquatic rootstocks, eggplant, chili and edible mushrooms, i.e., the abundance of foliage may reduce the abundance of other categories.This may be due to reasons such as competition between plants or limited resources in the same season or under the same growing environmental conditions.There was a positive correlation between cauliflower and aquatic rootstocks, eggplant, chili and edible mushrooms, while there was a negative correlation with flowering and foliage categories.This may be caused by reasons such as similarity in growing environmental conditions among them or competition for resources among them.While eggplant, pepper and edible fungi have a strong correlation between the three of them, in conjunction with the actual situation this may be the influence of artificial selection between them, thus showing the corresponding correlation pattern.
The correlation heat map of various vegetables is shown in Figure 1.It is important to note that in Heat Map 1  1 for the phanerogamous species,  2 is the cauliflower class,  3 for aquatic rhizomes,  4 is the eggplant class,  5 is the pepper group,  6 for edible mushrooms.

Figure 1: Heat map of correlation of various types of vegetables
It is known that the category of flowers and leaves has the highest percentage of sales, here we will filter the sample of single products in this category.Through Appendix I and Appendix II, we counted the top ten items in the flower and leaf category with the highest data sample size.They are, Yunnan oleander, Chinese cabbage, yellow cabbage, Yunnan lettuce, spinach, milky cabbage, sweet potato tips, choy sum, corns, and baby vegetables (in descending order).Taking them as the measurable indicators, they go to roughly predict the distribution pattern and characteristics of measuring the sales of single items in other categories, as shown in Figure 2: Through the analysis of the above chart can be seen, Yunnan oil wheat cabbage in addition to higher sales in July and August, its sales in the year are in a relatively stable situation; we will be cabbage, yellow cabbage, milk cabbage as a class for analysis can be seen, cabbage in the autumn and winter seasons sales are higher and show a trend of increasing and then decreasing in the spring and summer seasons, sales are lower to show a relatively stable state.Yellow cabbage and it is just the opposite, indicating that the two have a certain pattern of seasonal change and substitution.Milk cabbage sales tend to be a relatively stable trend throughout the year.Overall, the demand for vegetable single product shows a seasonal trend, in summer and winter relative high demand, in spring and summer relative low demand.
For the correlation analysis of this trivia question, we still use the formula for correlation analysis in the above trivia question and its correlation intensity table, which leads to the correlation results shown in Figure 3.The heat map above and its coefficients can be analyzed for correlation as follows.Exploring the correlation of individual products in the same category, we here analyze our this chart from an overall perspective.On the whole, negative correlation is dominant, which means that an increase in one of the two variables may lead to a decrease in the other, which is analyzed from a realistic point of view, i.e., most of them are used as substitutes for each other.The smaller number of variables with strong correlations indicates that they are interdependent, i.e., they act as substitutes for each other.

Individual product replenishment program development
The relationship between sales volume and cost-plus pricing is first fitted using the available data, and in this paper, regression analysis is chosen for fitting and prediction.Regression analysis is a statistical method to study the correlation between random variables.It establishes a quantitative relationship between one variable and another, i.e., a regression equation, by analyzing and calculating the actual observations of the variables.This subsection uses the pricing of each category () as the dependent variable and sales volume () as the independent variable to perform a one-way regression analysis and build a one-way regression model: where is the random error, obeying a normal distribution(0,  2 ) , , and  0 , and  1 is the regression coefficient.The specific steps of regression analysis are: Step 1: Determine the regression coefficients from the observations using the least squares method  0 , the  1 The estimated value of  ̂0,  ̂1 .
Step 2: The regression equation was tested for significance, F-test for linear relationship with ttest for regression coefficients.
Step 3: Prediction using regression equations A one-way linear regression was performed using Python programming, and the resulting regression equation was calculated using aquatic rhizomes as an example: The regression equation is tested for significance.The first test is the test of linear relationship, which is to test whether the linear relationship between the independent variable and the dependent variable is significant or not, and its test statistic is: where is called the mean square regression and is called the mean square residual.It is obtained that = 2703.557the significance of the relationship between the independent variable and the dependent variable = 0.000 , it is considered that there is a significant linear relationship between the independent variable and the dependent variable, so it passes the linear relationship test.
Next, the significance of the regression coefficients is tested, the significance test of the regression coefficients is to test the significance of the effect of the independent variable on the dependent variable, and its test statistic is: , is called the ̂ the estimated standard deviation and the results are obtained as shown in Table 2.According to the above graph then the effect of the independent variable on the dependent variable is considered to be significant and the regression coefficient of the test is passed.
Finally, the coefficient of determination is calculated 2 to determine the goodness of fit of the regression line, which depends on the magnitude of SSR and SSE and is calculated as: 2 The closer to 1, the better the fit, and  2 = 0.998 , indicating that the fit is excellent.In summary, the regression equation equation is significant.
Through the above series of tests, it shows that the one-way linear regression is reasonable.From the regression results, it can be seen that the regression coefficients are all negative, which is consistent with our prior expectations and in line with the actual situation, i.e., an increase in sales volume will lead to a decrease in pricing, i.e., the sales volume of the goods and pricing are inversely proportional to each other.Through the one-way regression analysis, it is found that the functions are all decreasing functions, but the degree of their reduction is small, so it can be concluded that the total sales of each vegetable category is negatively correlated with the cost-plus pricing, but it is not obvious, so although it has a linear correlation in nature but the impact is not significant.
For solving the total daily replenishment and pricing strategy for each vegetable category for the coming week, we use the cauliflower category as an example for solving the problem, and the rest of the dishes eventually show the calculation results.
Set the decision variable to  ,  = 1，2，3 … … 7 , denoting the total number of sales of cauliflower category on day day total sales of cauliflower category.
Since pricing is done using cost-plus pricing, cost-plus pricing is a pricing strategy that is commonly used in manufacturing and service industries to determine the selling price of a product or service.The core idea of this pricing method is to combine costs with the required profit to determine the final selling price.Thus, the daily rate is set to be  , the final pricing  can be expressed as   =   (1 +   ) .
Where.  denotes the cost of entering the cauliflower category on that day, so for the cauliflower category on day  The profit for the day can be expressed as   =   ×   。 Thus the total profit of the superstore in a week can be expressed as: The above analysis yields the predicted sales volume as shown in Table 3 and the pricing strategy as shown in Table 4.

Individual product replenishment strategy development
The gray prediction model GM(1,1) is a prediction method for predicting gray systems, which is used to predict systems that contain both known and uncertain information [5][6].This method is mainly through the identification of the degree of dissimilarity between the development trend of the system factors, that is, correlation analysis, and the generation of raw data processing to find the law of the system changes, to generate a strong regularity of the data sequences, and then establish the corresponding differential equation model, so as to predict the status of the development trend of things in the future, and finally get the model of its development [5].
Establishment of a gray prediction model: according to the sequence shows a monotonically increasing law of exponential form after accumulation, associating the differential equation ′ =  has an exponential form of the solutiony = e ax Thus, the first-order gray equation model is proposed, i.e., GM(1,1) model, in which the first 1 represents the first-order differential equation, and the second 1 represents the gray model containing only one variable.
The GM(1,1) model prediction steps are as follows: (1) Testing and processing of data In order to ensure the feasibility of the modeling method, it is necessary to make the necessary tests on the known data columns.Calculate the rank ratio of the reference series If all the grade ratios   all fall within the tolerable coverage Θ = ( − 2 +1 , ) within the tolerable coverage, then the sequence (0) can be modeled as(1,1) of the data for gray prediction, otherwise, the sequence needs to be (0) to be exchanged so that they fall within the tolerable coverage, i.e., take the appropriate number of normals and make a translation transformation such that the sequence (0) = ( (0) (1),  (0) (2), ⋯ ,  (0) ()) rank ratio of a sequence   () =  (0) (−1) Meet the requirements.

) Error checking
The following two tests can be used: a. Relative error test Calculate the relative error: Here. ̂(0) (1) = x (0) (1) , ifδ (k) < 0.2, the general requirements are considered to be met; ifδ (k) < 0.1, the higher requirement is considered to be met.
b. Grade ratio deviation value test.The class ratio is first calculated from the reference seriesλ(k) and then using the development coefficients ̂ to find the corresponding grade ratio deviation: If() < 0.2 , then the general requirements are considered to be met; ifρ(k) <01, the higher requirement is considered to be met. (

4) Predictive forecasting
The predicted values of the specified points are obtained by the(1,1) The model obtains the predicted value of the specified point, and gives the corresponding prediction forecast according to the needs of the actual problem.
Linear programming (linear programming, LP) is an important branch of operations research, which originated in the decision-making problem of industrial production organization and management, with the optimal value of ensuring that a number of linear variables satisfy linear constraints.Linear programming three elements of the approximate conditions constitute.Generally speaking, the decision variables are those quantities that the decision maker wants to control in order to achieve the predetermined goal, and the solution of the problem is to find out the final values of the decision variables; the objective function is the index that the decision maker wants to optimize, which is a linear function of the decision variables, describing the relationship between the decision variables and the predetermined goal; these constraints to be meaningful.
The general form of linear programming is: In the above expression, (1) is called the objective function, (2) is called the constraints, and is called the value vector; is the decision vector.By using the above principle, the following equation is listed: The final total profit obtained was $670.64.

Conclusion
In this paper, based on the measured data of a superstore, we conduct a relevant research on automatic pricing and replenishment decision of vegetable items.The trends of different categories under different seasons are plotted.The correlation relationship existing between the sales volume of different categories and individual items is analyzed, and line graphs, histograms, and Pearson correlation coefficients are considered to be used to solve the problem.The model established in the paper can help superstores predict demand more accurately, make replenishment plans, adjust pricing strategies, and improve market competitiveness.

Figure 2 :
Figure 2: Distribution of sales by individual product

Figure 3 :
Figure 3: Relevance heat map of each individual product

Table 1 :
Correlation coefficients for various types of vegetables

Table 2 :
Table of regression results

Table 3 :
Forecasted sales volume