The Analysis Meteorological Satellite Software Based on Principal Component

How to provide reasonable hardware resources and improve the efficiency of soft-ware is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Simulation results show that the proposed method can optimize the allocation of software hardware resources and improve the efficiency of software operation.


Introduction
With the increase of meteorological satellite observation level and the rich variety of remote sensing products, meteorological satellite remote sensing products are more and more widely used.National Satellite Meteorological Center Fengyun meteorologi-cal satellite ground application system deal with a large number of satellite observa-tions in real time every day.It have put forward high requirements on the timeliness and reliability for ground application system data processing.At the same time, it challenges the design and operation of these applications to support the work of IT platform.How to fully understand the resource requirements of various types of data processing software and make effective use of IT resources has become an urgent problem in the field of meteorological satellite ground application system.
Various types of meteorological satellite data processing prototype software in National Satellite Meteorological Center are the crystallization of countless scientist's hard work for several years.With the development of remote sensing instruments and the development of remote sensing technology, the ground meteorological software is constantly enriched and renewed.In engineering construction, these prototyping soft-ware become an important component of Fengyun meteorological satellite ground application system after engineering.It should be necessary to establish the detection and evaluation methods, after the engineering data processing software and the use of hardware resources to assess the rationality.Fengyun meteorological satellite ground application system has a large number of data processing software, so classification of software resources and the use of the characteristics of its operation is the basis for carrying out evaluation work.
Experimental data used in this paper is from the collected data on the operation of Fengyun-3C data processing software.First of all, we collected the original software running feature data, processing feature extraction, to better express the characteristics of the software.Secondly, principal component analysis (PCA) was used to analyze the operational characteristics of the collected data, and the principal components were extracted and their features were described.Then the clustering analysis is carried out by using the processed software characteristic data to realize the classification of meteorological software, such as computing-intensive, memory-intensive, I/O-intensive and network-intensive.Finally, based on the results of PCA, the characteristics of each type of software are described, which provides basic data and basis for further work, such as software resource consumption rationality analysis, software operation rationality evaluation, optimization of hardware and software systems, and provides scientific decision data support for future hardware and software platform planning and configuration of new projects.

Software and hardware environment overview.
The object of this paper are the 182 sets of polar orbiting meteorological satellite data processing software of the 12 categories of instruments for the Fengyun-3C satellite ground application system.Hardware resources, including 6 IBM minicomputers, de-tailed configuration in Table 1

Maintaining the integrity of the specifications.
Software operating characteristic data acquisition range contained 182 sets of polar orbit meteorological satellite data processing software for 12 kinds of instruments.Polar orbit meteorological satellites carried remote sensing instrument.Its mode of operation is to collect data on a regular basis and download the collected data to the ground station.The software needs to run multiple times per day (each run is called a track).Data acquisition environment is the simulation environment and acquisition time is 4 days.The collection method for operating characteristics of the software is to force each weather processing software running in serial (the actual environment running is in parallel), so that each software can get sufficient hardware resources and give full play to software performance.Software operating characteristics data acquisition types included CPU, system, process, and job level data, with CPU-level and system-level acquisition cycles of 1 second.Job-level data acquisition fields are the main software start time, end time and the located server.System-level data acquisition fields are CPU system and disk wait for usage, CPU idle usage, memory usage, virtual memory usage, disk read and write rates, network receive and send rate.CPU-level data acquisition field has the core CPU system utilization and idle utilization.

Characterization of operational characteristics.
Software feature analysis needs to express the operating characteristics of the software as much as possible, and ultimately to express the operating characteristics of each software through a vector [1].Characterization of the software running characteristics need to consider from two aspects: (1) time-series characteristics of software operation; (2) to eliminate differences between the platforms and the resource consumption of the system (only consider the resources consumed by the software itself).
The time-series features of the software running are represented by peak, mean and summation of resource consumption [2].Eliminating platform differences requires the conversion of resource usage to usage.The consumption of resources of the software itself needs to throw away the occupied resources of the system.To this end, we carried out based on the parameters of the collected information and software running on the server information, synthesis of new feature parameters.Specific treatment is as follows: Software running time: (1) CPU user calculation: (2) CPU calculation total: (3) CPU calculated peak: (4) Memory usage: (5) Virtual memory usage: Disk read: Disk write: Network receiving: (9) Network sending: (10) Through the above conversion, the software's each track operating characteristic data is transferred into a vector, and then we calculate the average value of the software multi-track running characteristics, finally formatted a 182×14 data matrix of the orig-inal operating characteristics.

Principal Component Analysis
Principal Component Analysis (PCA) is a statistical method [5].Through orthogonal transformation to a group of variables may be related to the conversion of a group of linearly unrelated variables, the group of variables after transformation is called the principal component.The results of principal component analysis are mainly dependent on the correlation between indicators.If the correlation is very strong, the results of principal component analysis will be very good, otherwise it is poor [6].
According to the paper [4,7,8], when ρ (cumulative %) ≥ 0.8 ~ 0.9, we can use the first five principal components instead of the original 14 operating characteristics, and retain the original 14 operating characteristics contain the main information, the first five principal components are called public influence factors.
According to the analysis in Table 3, the cumulative values of the four principal com-ponents of 1,2,3,4 are 78.471%,which can represent the main factors of the original matrix.In the process of running, the expression of the variable is not the original var-iable, but the standardized variable, such as the first principal component.In Table 3, it can be found that the main components in the first category are mainly re-lated to run time and disk read and write resources.The second category are mainly related to network resources and CPI.The third category is mainly related to compu-ting resources.And the forth category are related to memory and cache.Tables 3. Main ingredient.

Summary
In this paper, principal component analysis algorithm are used to classify the software in the meteorological field, and the software classification in the meteorological field is solved.At the same time, the characteristics of each kind of software are analyzed.Using the results of the classification, the software scheduling algorithm is further analyzed to improve the hardware utilization and reduce the soft-ware waiting time. .

Tables 2 .
Explain the total variance.