Design of Vehicle Defect Risk Assessment System Based on Multi-source Information Fusion

: According to vehicle multi-source quality safety data information, the key quality safety factors for automobile defect risk assessment are extracted, the association relationship map is established, and the vehicle defect evaluation index system is systematically constructed. On the basis of the correlation and optimization of key quality safety factors and index systems, a multi-dimensional cluster of defective vehicle quality and safety information is established by utilizing big data technology and so forth. Based on the historical automobile defect data information, the clustering method has been employed and formed more than 4000 typical automobile defect cases, association analysis on multi-source quality safety data information is conducted and developed vehicle defect risk assessment system is designed, which provides data and technical support for rapid warning of vehicle defect risk.


Introduction
China has been the largest automobile production and marketing country in the world for consecutive years.Due to the complicated structure and functions of automobiles, defects are nearly inevitable during the process of automobile design and manufacturing.In recent years, people pay increasing attention to the quality safety problems caused by vehicle defects.In 2016, there were over 10 million vehicles recalled due to vehicle defects.In 2017, there were over 20 million vehicles recalled due to vehicle defects.Recalling defective vehicles has become one of the important means to protect consumers' personal and property safety.The accurate judgment of vehicle defect risk is an important basis for product defect early warning, control and product quality improvement.Automobile quality safety information extraction is the premise of defect risk assessment.At present, the automobile defect risk assessment in China is mainly based on experts' qualitative analysis, where the objectivity, the scientification and the accuracy require further improvement [1].
With the rapid development of the Internet and big data technology, how to utilize data mining technology and likewise to analyze automobile defect information to improve vehicle defect risk assessment ability has attracted growing concerns from the industry.For instance, in 2010, Bae and others studied the role of defective vehicle recalls to reduce serious road traffic accidents by analyzing the relationship between historical recall data and road traffic accident data; in 2012, AS Abrahams proposed the method of using internet crawler and semantic analysis technology, by employing the method of using the internet to access clues of defective automobile products, it provides the multi-dimensional data support for vehicle defect early warning.In 2014, the NHTSA studied how the integration of multi-dimensional data can be used to improve identification ability of potential defects in automobile products [2].In 2014, Mahtab et al. filtered and picked the figure of injuries and deaths complaints from the NHTSA Database, and excavated potential safety problems from the mass data.The recall management work of defective automobile products in China started late, the research is still at the initial stage with relatively limited literature.In 2013, for example, Lian Lan-xiang et al. suggested the method of using data mining to conduct regression analysis on historical vehicle defect complaints and recall data and built the automobile defect risk assessment model.In 2015, Yang Shuang-long et al. proposed to study the defective automobile recall risk assessment and early warning method based on Internet complaint data [3].However, due to the data integrity issues and other problems, some of their research contents have to be improved and optimized in the application of defective auto products recall management in China.At present, domestic researchers in China concentrate more on the comparative studies of recall systems, consumer behaviors, and other research fields.Meanwhile, foreign studies focus more on the perspectives of economics, behaviourology, and alike aspects to analyze the recall forms, the effects of recall and the method of recall decisions.To sum up, there are limited studies on the automobile defect risk assessment field, expert consultation remains to be the most popular vehicle defect risk assessment method.This paper proposes the construction of a multi-dimensional defective automobile product quality safety information cluster, to excavate and quantify they key quality safety factors of automobile product defect identification.

Structured data processing
Automobile defect risk assessment includes automobile defective information analysis stage and experimental verification stage.Automobile defective information analysis is based on multiple quality safety information sources including defective information reports submitted by consumers, producer complaints analysis reports, producer product record data, producer technical service announcement, domestic and international historical automobile defective recall information, internet public opinions information and so forth.It is based on the experts' experience to synthetically evaluate the possibility of a car failure mode and the level of the risk grade and provide data support for the experimental verification of automobile defects.However, it requires enormous time and manpower in the process of automobile defect information analyzing due to poor interconnection and interoperability of multi-source quality safety information, severe problems of data fragmentation and islanding, which often lead to the loss of data and so forth.According to the needs of automobile defect information analysis, this paper proposes a knowledge map for vehicle defect information which is used for vehicle defect risk assessment.The construction of large-scale knowledge map often has a large number of entities and relationships that need to be extracted from the original data and stored in the format of diagrams.The original data of automobile defect information rest in the multi-source and heterogeneous environment, which requires to extract and integrate the original data Gini performance and knowledge, and subsequently form a structured knowledge data information.
This paper adopts a method that is based on dictionaries and rules which extract keywords from original vehicle defect data and recommend tags.The dictionary and rules used for keyword extraction include vehicle assembly, fault labels, fault severity grades and tag keywords.The system adopts the forward maximum matching algorithm method to process the original vehicle defect data information segmentation, conduct matching analysis of tag keywords, and recommend tags in accordance with the matching results.Figure 1 is the recommended tag results of automobile defect information complaint information, the system will conduct matching analysis among 1500 pre-installed automobile defect mode dictionary contents and complaint information, the matching results will be highlighted and be used for vehicle fault tag recommendation, engineers can, therefore, identify the final defect tag from the recommended labels.Through the label analysis of unstructured data, which can be transformed into structured data including the name of the producer, brand series, assembly and fault label, which can be used to construct the automobile defect information knowledge map.The knowledge map uses computers to understand texts, transform texts into structured knowledge forms, through entity linking, knowledge integration and knowledge computing and other related technologies, it fuses fragmented knowledge into a map [4] [5].The constructed automobile defect information map proposed in this paper is demonstrated in Figure 2.

Automobile Defect Information Knowledge Map Construction
In the Automobile Defect Information Knowledge Map in Fig. 2, the first entity group is the enterprise producer, the second entity group is brand series, the third group is owner complaints, the fourth is producer technical service announce, the fifth is internet public opinions, the seventh is domestic and international recalls.The distribution of entity relations and attributes is as follows: the enterprise producer has a number of brand series, there is a subordinate relationship between them; the most important attribute of the brand series is the number of sales; brand series will have a variety of different failure modes and the owners may report and complain to one or all the failure modes; the complaint information includes vehicle mileage, service time, working environment and traffic accidents and other attribute information, traffic accidents have the attributes of casualty and etc.; enterprise producers may publish technical service announcements for one or multiple failure modes; there may be public opinions on the internet concerning a particular failure mode, the influence of public opinions can be calculated by internet public opinions; domestic and international enterprise producers may issue the recall information concerning this failure mode after the defect being determined.The system will establish a knowledge map of vehicle defect information according to the attributes and labels of different entities.The body of knowledge map contains hierarchical information; "time" is a critical attribute in an event, the system will update the knowledge map with the recommendation of time.The knowledge map is originally used in research engines to enhance query understanding and research quality.Searching engines use knowledge map recognition and the entity and attributes of the keywords input by users to infer the associated entities of interests and recommend related knowledge to users.In addition, due to the graph structure of knowledge map, it can be used in expert systems, recommendation system, Q&A section based on knowledge map and visual analysis.By analyzing the query semantics of the user, a query model based on knowledge map is constructed to gather answers for users.This paper designs and develops an information retrieval system of automobile defects based on the constructed knowledge map of automobile defect information which realizes the association retrieval query of multi-source quality safety information.This paper uses "Sagitar" as the searching keyword which can rapidly obtain information related to "Sagitar", including vehicle configuration, domestic and foreign recalls, technical service announcement, owner complains, internet public opinions, typical cases and so forth, retrieval results are shown in Figure 3.

Automobile Defect Risk Assessment Index System
On January 1st, 2013, the implementation of Regulations for the Recall Management of Defective Automobile Products came into effect, which defined the vehicle defect.Automobile defects caused by designs, manufacturing, identifications etc. that do not conform to the national industrial standards for the protection of personal and property safety or other unreasonable risks that endanger personal and property safety, in the same production batch, model or type of cars.To evaluate whether a vehicle's typical fault is a defect shall include the three critical elements which are whether the defect is caused by design, manufacturing and identification reasons, whether it is a batch problem or whether it involves security problems [7].This paper classifies the relationship between the multi-source quality safety information and the three elements of automobile defects, sorting the influence factors of the different elements and proposes the evaluation factors for the quantitative evaluation of the three elements of automobile defects.

Whether the Evaluation Factor is caused by Design, Manufacturing or Identification
The vehicle ratio during the validity of "Three Guarantees" is the proportion of all complaints vehicles in the vehicle within their three guarantee period, to evaluate how new or old the complaint vehicle is.Vehicle working intensity is the vehicle mileage within the unit time, to evaluate whether the vehicle is overloaded.The ratio of regular vehicle maintenance is the proportion of all complaint vehicles received regular services, to evaluate whether the complaint vehicles have been used reasonably.When the working environment becomes a reference factor, if the vehicle is used in a special climate such as extreme cold, hot or damp environment for a long time, it may accelerate the occurrence of failures; or if the vehicle is running on very poor roads for a long time, it may accelerate the occurrence of failures; in such cases, the possibility of improper use of the vehicle can be improved.

Whether It Involves Batch Issue as an Evaluating Factor
The ratio of vehicle complaints to sales is the percentage of the number of complaints to the sales volume, which is used to access the frequency of the vehicle's breakdown.The transmission influence of public opinions is identified based on producer's name, brand-series, fault assembly and fault tag information, it involves the total number of news in the whole network, news number of the designated network and media news and the number of reads and reviews [6].After receiving data normalization processing, it calculates each evaluation indicator and integrates them in accordance with their weights, which are used to assist the evaluation of the frequency of the breakdown of this brand of the vehicle ( ) c t , shown in Equation ( 1) and ( 2).The analysis of whether there is a technical service announcement on the failure mode and any domestic or international recall information available is through associated analysis of whether other brands of vehicles have similar failure modes, which is used to assist in evaluating whether vehicles involve batch problems.
Including: ( ) c t -evaluating indicator; t -Evaluating indicators, include total news quantity, the number of media news of authoritative networks and the number of reading and reviews in the whole network; ( ) n t -Number of evaluating indicators; t θ -Adjustment coefficient, the overall news quantity adjustment coefficient of the whole network is 0.005, the adjustment coefficient of the designated network media news is 0.01, the adjustment coefficient of reading, and review amounts is 0.001.Through the weighted summation of each evaluation index, and calculate the transmission influence of internet public opinions of vehicle failures ( ) I c , shown in Equation ( 2).
( ) Including: ( ) I c the transmission influence of internet public opinions of vehicle failure; ( ) w t -The weight of evaluating indicators, the total weight of the whole network is 0.35, the weight of designated internet media news amount is 0.45, the weight of reads and reviews amount is 0.20.

Whether Involves Security Problem Evaluating Factors
There are default severity levels of vehicle failure modes based on expert experience and historical data statics preinstalled in the vehicle failure mode dictionary S [7] [8].Automobile failure mode dictionary presets the severity level of vehicle faults, shown in Equation ( 3).The safety risk grade determination is obtained through the vehicle failure mode dictionary when processing and labeling the multi-source vehicle quality safety information, which is used to evaluate the safety risk level of vehicle failure modes.The number of casualties caused refers to the statistics injuries and deaths caused by the fault mode of this brand of vehicle, which is used to assist the evaluation of safety risk level of this fault mode of the vehicle brand.

Low
The failure has no direct impact on vehicle safety Relatively Low The failure has some impact on vehicle performance or functions but controllable , ,

The failure reduces vehicle performance or functions but controllable Relatively High
The failure is sudden has reduced controllability may cause physical injury or property loss High 5 , , The failure is sudden non controllable may cause severe physical injury or property loss To sum up, this paper, extracted three elements for automobile defect determination based on the definition of automobile defects.According to the factors affecting the number of various factors of the multi-source quality safety information, the evaluation factors for the assessment of three elements are determined, which is used to construct the evaluating indicator system of automobile defect risk assessment [9], shown in Table 1.

System Structures of Automobile Defect Risk Assessment
The automobile defect risk assessment system designed and developed in this paper includes data source layer, data warehouse layer, business support layer and data service interface.Based on the defect information collection system, record information system, recall report system, call center and other business platform data, this paper uses big data technology including Hadoop, Hive and ElasticSearch etc., constructed the automobile defect information database and realized the integrated storage of multi-source heterogeneous data.Based on the dictionary and rules method, the tag recommendation of automobile multi-source quality safety information and standardized processing has been realized.According to the correlation relations of multi-source quality safety information, cross-platform multi-source heterogeneous data association and integration has been realized and the knowledge map of vehicle defect information has been formed.In accordance with automobile defect risk assessment index system, this paper designed and developed a management system for automobile defect risk grading evaluation.The overall structure of the system designed and developed in this paper is shown in Figure 4.

Application for Automobile Defect Risk Assessment System
The automobile defect risk assessment system designed and developed in this paper opened the multi-source quality safety information to avoid data isolating phenomenon between business systems and formed the knowledge map of automobile defect information.Based on the index system of automobile defect risk assessment, this paper also developed an automobile defect information assessment system and applied it in the information analysis of defective automobile product recall management.As of April 2018, over 4000 typical automobile failure cases have occurred and more than 1000 cases were recalled which provides data support for the recall management of defective automobile products in China.The system is shown in Figure 5.

Conclusion
According to the multi-source quality safety data information, key quality safety factors of automobile defect risk assessment is extracted, automobile defect evaluation index system is systematically constructed and a knowledge map of quality safety factors is introduced.Based on the optimization and correlation of quality safety factor index system, this paper has integrated multi-source quality safety information and developed the automobile defect risk assessment system.Based on the dictionary and rules method, the tag recommendation of automobile multi-source quality safety information and standardized processing has been realized.According to the knowledge map model of quality safety factors, the automobile defect information knowledge map has been formed.Based on the automobile defect risk assessment index system, the automobile defect risk assessment system has been developed which has been applied in the recall management of defective automobile products.As of April 2018, based on historical defective automobile data information, more than 4000 defective automobile cases have occurred and over 1000 cases were recalled.Through the design and development of this system, it can provide effective data supports to the defective automobile product recall management in China.

Figure 1
Figure 1 Automobile Defect Information Tag Recommendation

Figure 2
Figure 2 Automobile Defect Information Knowledge Map.

Figure 3
Figure 3 Automobile defect information retrieval based on the knowledge map.

Figure 4
Figure 4 System Structure of Automobile Defect Risk Assessment System.

Figure 5
Figure 5 Application of Automobile Defect Risk Assessment Evaluation System.

Table 1
Automobile Defect Risk Assessment Factors.