Research on Reliability Model of Electric Power Information System

: In recent years, with the rapid development of information system used in power industry, the reliable operation of information system is the basic guarantee of business continuous operation, and any failure of system will bring great economic loss. Therefore, the requirement of the reliability of the power industry information system is high, but there is no scientific and effective theoretical basis for the investigation of the reliability of the information system. On the basis of fully considering the reliability of software, hardware, storage and network of information system, this paper adds the research on the reliability of hardware and software, establishes the reliability model for power information system, and provides a powerful reference for the reliability evaluation of power information system.


Introduction
With the rapid development of computer technology, computer systems in today's information society in all fields of application more and more widely, especially in some important industries such as telecommunications, finance and energy, information technology directly affect the industry's competitive strength and development level. Greater demands were being placed on the computer system performance, quality and reliability. Among them, domestic and foreign scholars have done a lot of research work on the whole life of system performance and quality [1].However, the investigation of system reliability is not well verified because of lack of means and basis. For key industries such as the power industry information system, the business usually requires 24 hours uninterrupted operation, so the reliable operation of information system is the basic guarantee of business continuous operation, any system failure will bring great economic loss.
Therefore, it is very important to establish the reliability model of information system for power industry, which will provide the basis for comprehensively measuring the reliability of information system for power industry, and further ensure the stable and orderly operation of information system in power industry. This paper will deeply study the reliability requirements of information system for power industry, establish the reliability model for power information system, provide the basis for the reliability of power information system, and provide theoretical support for the implementation of power information system reliability test.

The concept of reliability for power information system
The reliability of the power information system is defined as the ability of the information system to implement its function in a given time. The reliability is usually measured by mean time of failure (MTTF), MTTF is how long the time of information system can run normally when one fault occurs. The reliability of the system is higher, the average trouble-free time is longer. Maintainability is measured by mean time to repair (MTTR), which is the average time spent repairing and resuming normal operations after a system failure. The better the maintainability of the system, the shorter the average maintenance time. The availability of information systems is defined as:MTTF/(MTTF+MTTR)×100%. Thus, the availability of information systems is defined as a percentage of the system's uptime.
Industry typically uses the number of "9" as shown in table 1 to classify the types of computer system availability.
The electric power information system is mainly composed by four parts: computer software, computer hardware, network, storage. Therefore, the reliability model of power information system is mainly studied from the aspects of software reliability, hardware reliability, network reliability and storage reliability. And at the same time, considering the close connection between software and hardware, included the hardware/software interaction reliability in the research scope.

Software reliability
In 1983, the American IEEE Computer Association formally defined "software reliability" as follows: 1) Under the specified conditions, in the specified time, the probability of software does not cause the system failure, the probability is the functions of system input and system use, also the functions of error exists in the software; The system input determines whether an existing error is encountered (if the error exists); 2) The ability of the program to perform the required functions within the specified time period, under the conditions described.
With the expansion of the scale and complexity of computer software products, the reliability of software system plays an important role in the field of engineering and computer engineering. Information System reliability is the probability that the system will run successfully according to the design requirements at a given time interval and given environmental conditions, and the successful operation not only must ensure that the system can run correctly and satisfy the functional requirements, but also ensure a certain performance service level, and the system can resume normal operation as soon as possible, and the data will not be destroyed [2].

Hardware reliability
The definition of hardware reliability is similar to the definition of software reliability, and can also be defined as the probability that the product will run without failure during a specific time interval. However, the mechanism of hardware and software failure is different, the failure reason of software is design error, and the failure reason of hardware is always the physical deterioration. The loss and other physical causes of the failure probability is far greater than due to the discovery of the failure caused by design problems, because the hardware logic is relatively simple, Therefore, it is possible to keep the design failure of the hardware at a low level [3].
On hardware reliability research, people are concerned about the reliability of the quantitative indicators are: product reliability, availability, MTTF, MTTFF, frequency of fault, MUT or MTBF, MTBR, MDT and MCT.

Hardware/Software interaction reliability
Although the hardware and software reliability in the definition of more consistent, but there is a large difference between them, such as: (1) Hardware is physical entities, while the software is a logical expression; (2) The hardware in the production process, the use of the process and material changes can cause internal failures, and software defects are the development process of design; (3) The risk of hardware failure is usually to repair the failure of the components, reliability can only be maintained, and software through the defect culling can continue to improve reliability; (4) The software product itself is not dangerous, the hardware product itself is risky.
In spite of the many differences mentioned above, the method of software reliability theory and the hardware reliability theory are still compatible, and the software reliability and hardware reliability are necessary to be managed as the attributes of an integrated system.

Network reliability
Network reliability is the ability of the network to work properly under specified conditions.

Storage Reliability
Storage reliability means that after some hardware failure in the system, the fault data can be recovered by using its own fault-tolerant mechanism [4]. How to evaluate the reliability level of an electric power information system quickly and effectively is the core of our research on the reliability of power information system. The most fundamental problem is how to establish a reasonable and usable reliability model for power information system. Since the 60 's, there have been many outstanding scientists at home and abroad, do a large number of research of software reliability and hardware reliability, obtained hundreds of software reliability model and hardware reliability model. However, based on the whole information system as a unit, considering the reliability analysis and research of the software, hardware, the interaction of software and hardware, the network and the storage, this paper studies and formulates the reliability quality model of the power information system based on the overall reliability of the power information system.

Establishment of reliability model of power information system
The electric power information system mainly consists of computer software, computer hardware, network, data storage and rules and regulations, which is the human-machine integration System for the purpose of processing information flow. The reliability of information system is the probability of running successfully according to the design under the given time interval and given environment condition. Successful operation first to ensure that the system can run normally, to meet the functional requirements, and secondly to ensure efficient operation of the system to meet performance requirements; Finally, when the system has an abnormal fault, it can guarantee the data integrity and the system recover as soon as possible [6].

Software reliability
Software reliability refers to the application of engineering means to implement software reliability technology to ensure and improve software reliability during the software development stage and the whole life cycle of software. According to the Software reliability Engineering outline published by AT&T Lab in 1992, the implementation of software reliability engineering can be attributed to 4 basic stages: feasibility and requirements, design and development, system testing and field testing, operation and maintenance [6].

Hardware reliability
Hardware reliability refers to the probability that the product runs efficiently over a period of time. The source of hardware failure is physical deterioration, and the probability of its loss and other physical causes is much greater than the failure caused by an uncovered design problem. Therefore, the hardware mainly consider the server, storage and network equipment and other hardware equipment reliability. In addition, according to the "Computer Room design Code" standard, to consider to ensure the effective operation of the hardware engine room environmental factors, so introduced the environmental reliability.

Hardware/Software interaction reliability
Most of the current research is the software reliability and hardware reliability are considered separately, and does not consider the interaction between them, but in the information system, software and hardware are mutual influence. A possible situation is if the hardware (software) failure causes the software (hardware) to fail, which in turn results the failure of the information system. In this case, we cannot simply consider the software reliability or hardware reliability, the hardware and software should consider as a whole to consider the system reliability. Mainly include stability, fault tolerance and easy recovery, and many other factors.

System stability
The stability is mainly refers to the hardware and software system under high pressure and high load, still can be stable and efficient operation of the ability, mainly including load bearing capacity and load balancing capacity.

System Fault Tolerance
Fault tolerance refers to the ability of a system to recover from errors that occur in the software or hardware in which it is running. For example, when the system encounters illegal input data, disoperation, the related software or hardware components of the defect or abnormal operation, the ability to continue to operate normally, including disoperation processing, failover processing.

Easy recovery of the system
Recoverability refers to the ability of the system to re-establish the specified performance level and restore the affected data in the event of a malfunction or abnormality. This includes recovery capabilities, recovery metrics, emergency management, and disaster recovery levels.

Network reliability
Network reliability refers to the ability of the network to complete the prescribed functions under specified time and specified conditions [7]. The complexity of the network leads to the complexity of its "prescribed condition" and "prescribed function", which is difficult to evaluate with traditional reliability theory and method. In this paper, the reliability of network is studied, and the reliability of network is compared with the following 4 different characteristics: (1) The fault is complex: The network has hierarchical structure and different levels of failure have different failure mechanism and influence.
(2) Topology Special: The physical connection between components to form a hard topology, business logic to form a soft topology, and these topological structure to form a network, it is difficult to use a series-parallel relationship description.
(3)Fault transmission: involving information processing and communication, failure of mutual impact and transmission.
(4) Use dependence: The use factor has the important influence to it. The impact of usage on reliability is typically reflected by business and traffic.
These characteristics make the study of network reliability different from the traditional system, and lead to its analysis and evaluation need to be carried out from many aspects, each level of the different types of fault, the users concerned, and even the stage of the network life-cycle of the phase also differs. The Network Reliability 3 layer model is shown in Figure 1.  Fig.1 Network reliability model Fig.2 Reliability model of Information system The topology/physical layer takes the function fault of the network component as the core, mainly investigates the network's connectivity reliability, and the rule/configuration layer focuses on the performance failure of the network component, mainly investigates the performance reliability of the network, and the Business/service layer takes the process fault as the core, mainly investigates the network's business reliability.

Topology/Physical Layer
Physical devices and connections form the visible network hard topology, which is the foundation of the network. For example, in a communication network, the composition of this layer is: routers, cables, communication base stations and servers and other physical facilities to form a topological structure.

Rules/Configuration Layer
The configuration of the device forms the rules of network operation and provides the basic support ability of the network. For example, in a communication network, the composition of the first layer is: routing configuration, switch buffer size settings and other forms of communication network operating rules.

Business/Service Layer
The topology/physical layer and the rule/configuration layer together form the infrastructure network, the service is the function that the infrastructure network provides externally, the business is the combination of the service, and the network end user completes the application to the network through to the service or the business use. Generally, the evaluation of the reliability of the network is the object of the network application system which contains the service and the business.

Storage Reliability
In addition to metrics that measure storage system reliability, availability and durability, vendors operating storage systems often use RPO and RTO to demonstrate the impact on users when a failure occurs: 1) Availability: At any given moment, the system can be operated correctly, according to the user's behavior to perform its functions, measured by (time of failure/total time).
2) Durability: The ability to maintain that data is not lost/destroyed is measured by (1-probability of data loss).
3) RPO (Recovery Point Objectives): When a disaster occurs, the system can restore data to a point in time before the disaster occurred. It is an indicator of how much data the system will lose after a disaster occurs. The shorter the RPO, the less data will be lost. A 0-minute RPO indicates that no data can be lost because data is backed up, copied, or recorded in a timely manner, preventing any data loss.

4) RTO (Recovery Time Objectives):
The recovery time objective is the maximum time to recover the data.
Summing up the above content, we can define the reliability model of the power industry information system, it divides the reliability attribute of information system into five categories (hardware reliability, software reliability, mutual reliability, network reliability, storage reliability), and further divides each large class into several indexes, establishes the information system reliability model, As shown in Figure 2.

Characteristics of reliability model of power information system
An important reason for the power outage of 8.14 is that the state estimation function exits the operation, the dispatcher loses the ability to perceive the real-time state of the power grid, fails to detect faults in time, and eventually causes the fault to spread. The June 5, 2008 Washington Post reported that the 48-hour emergency shutdown time for the Georgia nuclear power plant was caused by a network failure (a software update failure).Now information technology plays an important supporting role in the electricity market, taking financial management as an example, financial management is one of the core business of power enterprises, which has always been the "play" of enterprise development. ERP is an information system which supports the fine management and standardization operation of electric power enterprises by means of advanced information technology, taking finance as the core, integrating logistics, capital flow and information flow as one. Based on the characteristics of large volume, high integration, high real time and high specification of ERP system, the system load endurance, load balancing ability, disoperation and rapid recovery of fault transfer are more stringent.
Based on the ERP system of an electric power company, the test engineer bases on the reliability model of the power industry information system in this paper, and formulates the reliability Measurement Index system of ERP system, which mainly involves the software reliability, hardware reliability, the mutual reliability, the network reliability and the storage reliability. And according to the sub-characteristics of each part of the test case design, completed the implementation of the Test project. Through the implementation of the project, the validity of the reliability model for power information system is proved, which provides rich technical accumulation for the future evaluation of the reliability of power information system.

Conclusion
This paper presents a reliability model for power information system, which is based on the software reliability, hardware reliability, hardware/software interaction reliability, storage reliability and network reliability. In contrast to the traditional separation between software reliability and hardware reliability, considering that software and hardware are closely interrelated, there may be a failure of the system due to software propagation or amplification, or the existence of a system failure due to the combination of software failure and hardware failure, In this paper, the reliability model of hardware-software interaction is also included in the reliability model, which is more comprehensive. After testing the ERP system of a power company, the validity of the reliability model is proved. However, in the testing process, it is found that the reliability requirements of different functions of information systems are slightly different, and the Reliability model adjustment and improvement should be further based on the characteristics of each power information system. The reliability of information system for power industry will be the focus and difficulty in the field of power information system for a long time, and it needs common efforts and contributions from industry counterparts.