Design of an AI Health Risk Assessment System for Dietary Hygiene of Key Groups Based on IoT Wearable Devices

: Population Spatio-temporal big data mining and analysis techniques have been applied to risk assessment of disease transmission, which can describe disease transmission pathways and high-risk areas in fine detail. Based on spatial statistical analysis and artificial intelligence technology, this study seeks to break through the previous risk warning model of a single data source from medical institutions in the era of small data and designs an AI health risk assessment system for the dietary hygiene of key populations. The system is designed to collect multi-source Spatio-temporal big data consisting of urban population positioning, a sanitary inspection of restaurant premises, foodborne disease cases in medical institutions, and environmental monitoring. Spatial location attributes are assigned to the monitoring data, and food and multi-source data are fused across borders. Through the Internet of Things (IoT) technology, the system is designed with an IoT system consisting of sensors for automatic monitoring and wearable devices for real-time warning. Based on the spatial and artificial intelligence models, the system designs personalized and real-time early warning information for critical populations to prevent dietary health risks and provide scientific basis and support for public health departments to prevent foodborne diseases.


Introduction
A healthy and hygienic diet is an essential element of good health. Diseases of food origin refer to infection or poisoning caused by the entry of food into the body's causative factors [1]. Bacteria and viruses cause most infectious diseases of food origin. Many factors influence the distribution, outbreak, or epidemic of these diseases. The apparent influencing factors include traditional, local, or behavioral factors such as kitchen environment, food procurement, food processing, and eating habits. There are few studies on the connection between external objective environments such as geospatial or spacetime. The big data era has produced numerous big spatial data with time and space markers that can describe individual behaviors, such as mobile phone data, taxi data, and social media data. These data provide a new way for people to understand the socio-economic environment quantitatively. In recent years, scholars in computer science, geography and science, and complexity science have conducted many investigations based on different data and tried to find the spatiotemporal dynamic mode of massive, big data and establish a reasonable interpretative way. Studies have shown urban population lifestyles influence that disease transmission. These spatially meaningful data with positioning characteristics provide valuable clues for epidemiological studies of health and the environment. It is a prerequisite for establishing a risk assessment system for diseases of food origin. The system is designed to extract the characteristics of environmental surveillance objects automatically. It intelligently classifies the monitoring objects through artificial intelligence neural network models and calculates the risk values of foodborne diseases in each street area. It then drives low, medium, and high-risk areas for foodborne diseases in the monitoring area through machine learning correlation analysis methods generates risk assessment maps and provides early warning to users in high-risk areas through wearable IoT devices [2].

Data Collection Layer
The framework of the system design is shown in Figure 1. The system first collects and monitors the city's population location big data, restaurant health inspection big data, medical institution foodborne disease case data, and environmental monitoring big data (chemical fertilizer and pesticide pollution, water pollution, etc.), which collectively make up multi-source Spatio-temporal big data. The system is designed with an acquisition layer that collects the above multi-source data through automated methods of IoT devices. Among them, the city's population location data is obtained through the location and heat interface of mobile operators and map service providers, the restaurant hygiene inspection data is gained through the "Bright Kitchen" project automatic monitoring equipment installed in restaurants [3], the foodborne disease case data is automatically received from the information system of monitoring medical institutions, and the environmental monitoring data and the pollution data is got through the IoT sensors installed at the monitoring points.

Multi-Source Data Aggregation Layer
On the unified Spatio-temporal framework, combined with 3S (GIS, RS, GPS) technology [4], the multi-source data aggregation layer takes the multi-source Spatio-temporal big data obtained from the data collection layer. It classifies each monitoring data as low, medium, and high qualitatively under the unified Spatio-temporal framework with the street of the monitoring city as the smallest unit.

Data Mining and AI Model Analysis Layer
Data mining and artificial intelligence model analysis layer, based on spatial autocorrelation model (Moran's I), machine learning methods (Machine Learning), and deep learning methods (Deep Learning), to model and analyze multi-source big data on a unified spatial and temporal scale, including the application of spatial autocorrelation analysis methods to derive maps of population distribution and dietary disease characteristics. The unsupervised machine learning model trained from big data can automatically derive risk levels for food-borne diseases at different spatial and temporal levels.

End-Application Warning Layer
The end-application warning layer is aimed at warning the predicted Spatio-temporal areas of low, medium, and high risk of dietary hygiene. The system is designed to warn the user of the corresponding dietary risk level in the risk area through wearable IoT devices (bracelets, etc.) that scan GPS location information in real-time.

Key technologies for foodborne disease assessment models
Spatial statistical models and artificial intelligence models [5] are used for intelligent analysis of multi-source collected data, including key technologies such as spatial autocorrelation analysis algorithms, unsupervised machine learning clustering analysis algorithms, and deep learning convolutional neural network models.

Spatial Autocorrelation Analysis Method
Spatial autocorrelation analysis mainly refers to the degree of correlation between an attribute value on a geospatial region and the same attribute value in its neighboring spatial regions. The spatial autocorrelation coefficient is usually used as a primary metric to test whether a particular attribute value in a unit region has high-high adjacency, low-low adjacency, or high-low adjacency. Spatial autocorrelation analysis is mainly divided into global and local spatial autocorrelation analysis [6]. The commonly used spatial autocorrelation analysis methods are Moran's I, Geary's C, Getis, and Moran scatter plot.
The global spatial autocorrelation analysis focuses on whether the attribute variables are aggregated from the whole study area, and its formula is as follows [7].
Where n denotes the number of regions in the space of studied attribute variables; Xi denotes the value of the attribute variable within the ith region (e.g., disease incidence) and Xj denotes the value of the attribute variable within the jth region, indicating the mean value of the attribute variable in the region under study; Wij denotes the spatial weight matrix, determined as follows.
1，When region i is adjacent to region j, 0 ，Other situation. ( Under the z hypothesis, the expected value of Moran's I is as follows.
Under the assumption of normal distribution of spatial objects, the variance of Moran's I is as follows.
Under the assumption of random distribution of spatial objects, the variance of Moran's I is as follows The Z-score statistics for Moran's I are determined as follows.
If |Z|<1.96, P<0.05, the zero hypotheses are rejected, the overall spatial autocorrelation coefficient is not zero, and the attribute variables are considered spatial autocorrelation. When there is a negative spatial correlation, the value is less than 0, and the closer to -1, the stronger the negative correlation is, that is, the greater the spatial variability of the object of study; when there is a random distribution, the value is close to 0, that is, there is no autocorrelation.
Local spatial autocorrelation analyzes whether there is aggregation in spatial distribution among attribute variables from a specific local area within the overall geospatial scope. The results can be used to explain and detect "hot spots" or "cold spots" in the spatial aggregation of the attribute variables. Moran's I > 0 indicates the existence of a positive spatial correlation between local spatial units and neighboring spatial units, which is expressed as "high-high" or "low-low" aggregation. When Moran's I < 0, the spatial correlation between local spatial units and neighboring spatial units is negative, manifesting as "low-high" and "high-low" aggregation. The results of the spatial autocorrelation simulation analysis constructed at the street scale with a city as the research target are shown in Figure 2. The simulation results showed that four locations presented a meaningful aggregated distribution of high and low values (High-Low), showing a high incidence of foodborne illness in this street and a low incidence of foodborne illness in the surrounding community, suggesting that this community and the surrounding community belong to the intersection of hot and cold spots of foodborne illness, which is of further research significance.

Unsupervised Machine Learning Clustering Method
The cluster analysis method [8] is an unsupervised machine learning method and has many applications in biological and medical classification problems. A large number of observations can be grouped into several classes. Here, a class is defined as a group of several observations, and the similarity of observations within a group is higher than the similarity between groups. The k-means algorithm commonly used in cluster analysis methods is a typical method of dividing clusters and belongs to unsupervised machine learning. The idea is to obtain the classification of each sample point by minimizing the sum of squares of errors within the group given the number of clusters k.
The advantage of the AP cluster analysis method [9] applied in this system design compared with K-means clustering is that the number of clusters does not need to be given in advance, and the optimal number of clusters can be automatically analyzed and obtained by the AP algorithm. The AP cluster analysis method applied in the design of this system is an unsupervised machine learning clustering algorithm. The basic idea of the AP algorithm is to treat all samples as network nodes and calculate the clustering center of each sample through the message transmission of each edge in the network. In the clustering process, two kinds of messages are passed among the nodes: responsibility and availability. By passing messages between points, figurative elements are finally selected to complete the clustering by continuously passing messages. The AP algorithm continuously updates each point's availability and attribution values through an iterative process until m high-quality Exemplars are produced. In contrast, the remaining data points are assigned to the corresponding clusters. AP clustering is a continuous iterative process. The iterative process mainly updates two matrices, Responsibility matrix R = r(i,k) and Availabilities matrix A = a(i,k), which are determined as follows.  A heat map shows the degree of similarity of each district relative to other districts, with colors ranging from light yellow to dark red, darker colors showing greater differences between the district and other cities, and no colors revealing minimal differences. The results of Figure 3 demonstrate that the optimal number of clusters derived from AP analysis is 5, which are red (zone 21,Tanz), blue (zone 2, zone 5, zone 18), yellow (zone 4, zone 7, zone 9, zone 11, zone 20), purple (zone 1, zone 3, zone 8, zone 10, zone 13, zone 17, zone 23) and green: (zone 6, zone 12, zone 14, zone 15, zone 16, zone 19, zone 22, zone 24). The unsupervised clustering machine learning results show that zone 21 is separated into a separate class, demonstrating that it is incredibly different from the other zones and is therefore classified separately with further attention.

Early warning based on wearable IoT devices
Traditional food safety risk warning methods are limited, and the warning is mainly for professionals in food supervision departments, which require dedicated personnel to operate. It is difficult for the general public to use it in their lives. The terminal warning of the wearable IoT device designed in this study realizes personalized and real-time food-borne disease warning information for key groups of people and can effectively scan and suggest possible dietary hygiene problems in the scope of daily life with a real-time warning function based on the geographic location information of individual users.
The structure of the wearable IoT device is based on a smart bracelet, including the main ring with a data transmission module, 5G SIM card, GPS module, microprocessor, and alarm light stuck on the ring. The smart bracelet receives information on the dietary hygiene of people through the data transmission module and 5G SIM card, and the AI health risk assessment system sends information on dietary hygiene through the server, including the coordinates of the spatial location of areas with a high risk of food contamination. When the distance between the smart bracelet and the coordinates of the food contamination source sent by the server is less than the preset distance, the microprocessor sends a command to drive the alarm light to send out the specified alarm message to remind critical groups of people, such as students, the elderly, and wild plant pickers, to check the food safety warning information and avoid the problematic food in time.