Research on Improvement of Sampling Methods for Quantitative Measurement of Safety Culture

Based on the previous studies of the research team, this paper improves the sampling method used in quantitative measurement of safety culture, namely stratified sampling. It has been found that stratified sampling may have problems such as single factor of investigation, difficult determination of number and deviation of prediction. In order to solve these problems, a multilayer stratified sampling process was applied in designing safety culture quantitative questionnaire. The multilayer stratified sampling was carried out in a large coal mining enterprise, and it was found that this sampling method was more advantageous than the original Neyman allocation method in the estimation of variance and the investigation factors.


Introduction
The research results of the author's research team in the area of safety culture mainly included: the definition of safety culture [1], 32 safety culture elements [2], and the quantitative measurement questionnaire which is made up of the corresponding problems designed by each element.Based on this, a Safety Culture Analysis Program was developed in 2006.Stratified sampling was used in the quantitative measurement of safety culture.Based on this research, this paper improved the sampling method used in quantitative measurement of safety culture.

Sampling Methods in Quantitative Measurement of Safety Culture
Commonly used sampling methods include simple random sampling and stratified sampling [3].There are various ways of distribution in stratified sampling.The three widely accepted methods are proportional allocation, optimal allocation and Neyman allocation [4].
At present, the quantitative measurement method of safety culture designed by the author's team is a stratified sampling method under Neyman allocation method [5].This kind of sampling method does not take into account such as the level of education and other factors.At the same time, the process of determining the sample size of the system is studied and it is found that the pre-survey process also has certain deficiencies.The investigation factors are single in the sampling process.The number of pre-surveys is difficult to identify and there are deviations in estimates.
There are several problems in this process.First of all, the theoretical basis and empirical basis for pre-survey of 8 units from each layer for simple random sampling are simply inadequate.According to the sampling theory, at least 30 samples should be selected at the time of sampling to achieve a certain degree of accuracy [6].Although the extracted 32 samples satisfy the estimate of the population, in the process of estimating the variance of each layer, there are only 8 samples, so there may be a certain deviation for the variance estimation of each layer.From the Equation (1) of the stratified sample size, it can be seen that the predicted deviation sh will have an impact on the subsequent determination of the total sample size n.The determination Equation (2) of each stratified sample size shows that the estimated variance sh and the total sample size n of each stratification will be used to determine the stratified sample size.Therefore, in this process of applying multiple estimates, the sample size nh of each layer will be more deviant. ( At the same time, the pre-survey also faces difficulties in determining the number of people in practice.Based on the above reasons, multiple stratified sampling can be used to examine and determine the sample size of other influencing factors.

Sampling design effect
In order to compare the efficiency of different sampling designs and overcome difficulties encountered in complex sampling designs, the world-famous sample survey expert L.Kish proposed the concept of design effect in the book survey sampling in 1965.The design effect (deff) is the ratio of the variance of a specific sampling design estimator to the variance of the estimator without a back-off for a simple random sampling under the same sample size, ie:  (3) From Equation (3), it can be seen that the two functions of the design effect can firstly compare the efficiency of different sampling design methods, and can also calculate the sample size required for other sampling design methods.
Variance of estimators under simple random sampling is shown in Equation (4).
The definition of the design effect is shown in Equation ( 5)- (8).
The form of the right side of the equation can be regarded as the variance of the estimator of a simple random sample with a sample size of n*,which might be set to ) ( * y V srs . So there are Equation (9).
In Equation (9), the left side of the equation is the variance of the complex sampling design estimator, and the right side is the variance of the simple random sampling design estimator.The meaning of this equation is that the precision of the two designs is the same on the premise of * n = deff n / . Therefore, in order to achieve the same precision for different sampling methods, the actual required sample size n and the sample size n* required for simple random sampling will have * n = deff n / .There is Equation (10).
Therefore, under certain precision conditions, if the design effect of complex sampling can be estimated as deff, and the sample size n* required for simple random sampling, then the required sample size can be found to be deff n × * .This paper attempted to estimate the sample size of stratified sampling by design effect.

Determination of sample size for simple random sampling
The formula for design effect of the sampling method includes the sample size of a simple design sample.In this section, the sample size of simple random sampling was introduced.
In simple random sampling, the variance (sampling error)of y is shown in Equation (11).
Sampling error indicates the precision of sampling results.The smaller the sampling error is, the higher the sampling precision is and the better the representation of the extracted samples is to the whole.Equation ( 12) can be obtained. 2 Through Equation ( 12), it can be seen that the factors affecting the sample size n mainly include three main factors, the overall size N, the sampling error ) ( y V and the overall variance 2 S .However, in the actual calculation process, generally only the overall size N is known, and the overall variance is unknown.On the other hand, the sampling error is not only related to the overall variance, but not used as an indicator of survey accuracy in the actual calculation process.Instead, the confidence 1-αand the absolute error limit are used in place of the sampling error . Therefore, the size of the sample size n cannot be determined according to Equation (12).
According to the definition of bilateral quantile 2 / α Z , there is Equation (15).
Comparing the two formulas and organize them, shown in Equation (16).
Equation ( 16) is brought into Equation (12) to get a simple random sample size Equation (17).
From Equation (17), the factors that affect the sample size of simple random sampling include the overall scale, confidence, absolute error limit and total variance.The relationship with n is that the higher the precision, the greater the overall variance, the greater the overall sample size, and the smaller the absolute error limits, the more sample sizes are required.The first three items are known items in the calculation, however, the total variance is unknown and needs to be estimated.

Estimation of Design Effect of Stratified Sampling.
In stratified random sampling, the sample size has different distribution forms at each layer.Here, the most common proportion distribution is used is the representative of stratified sampling.
The variance of the mean estimate of simple random sampling is shown in Equation (18).
The variance in Equation ( 18) and the variance in Equation ( 13) refer to the variance of simple random sampling, but the expression is different.
The mean variance of stratified random sampling with proportional distribution is shown in Equation (19).
Therefore, the design effect of stratified random sampling is shown in Equation ( 22).
It can be found that the design effect is less than 1 when the sample size h N of each layer is large, that is, the precision of the proportional distribution stratified random sampling is higher than that of the simple sampling.At the same time, this is also explained theoretically why the precision of stratified sampling is higher than simple random sampling.In the actual sampling survey process, the design effect of stratified sampling can be obtained by the following methods.
(1) Estimation according to similar experience.
(2) Using similar historical survey data, we get the formula according to the formula.
(3) According to the principle of "choose bigger or not smaller", deff should be regarded as the maximum value of 1.
In the quantitative measurement of safety culture, if there are many similar survey data and few personnel, that is, it is difficult for a company to organize too many employees to carry out safety culture measurement, then the method 2 can be chosen.If you want to improve the precision of the measurement results, the personnel arrangement is easier and the method 3 can be selected.This method is most commonly used in the actual survey.It can not only improve the overall precision of the sample, but also eliminates the calculation process in method 2, which theoretically illustrates that a simple random sampling method will be used to determine the total sample size in the actual survey, and then the stratified sampling is used to determine the sample size of each layer.

Multiple Stratified Sampling.
In the quantitative measurement of safety culture, the safety culture level of a company is not only influenced by the level of the personnel, but also affected by the level of education and working years.In questionnaires collected in safety culture measurement systems, the participants are usually required to fill in their own level of personnel, level of education, working years, etc. [7], but in the subsequent analysis of results, only analysis is performed according to different levels of personnel, ignoring the analysis of the level of education, working years, and it is stratified according to different personnel levels.In order to solve this problem, this paper attempted to solve it through multiple stratified sampling.
When there is a correlation between the survey indicator and two or more auxiliary variables, in order to improve the stratified benefits, each auxiliary scalar needs to be stratified.The general approach is to divide the main variable into the large layer, and then divide the sublayers into the sublayers according to the second main variables in the large layer.When there are multiple stratified variables, this stratified sampling method is called multiple stratification [8].
For multiple stratification, the sample allocation problem of each sublayer should be considered after the sub-stratification is completed.The most commonly used sample size allocation scheme is allocated according to the size of each sublevel [9].In order to illustrate the process in detail, the following two variables (variable 1 and variable 2) are taken as examples to illustrate the process of determining stratified sample size.
It is assumed that according to the variable 1, it can be divided into M level and each layer weight is Wk (k = 1, 2,).Variable 2 is divided into N layers and each layer weight is Wl (l = 1, 2,).Then there are Equation ( 23) and (24).
According to the cross stratification between variable 1 and variable 2, the layer weight of each sublayer is shown in Equation (25).
The total sample amount is n, then the sample size of the kl sublayer is shown in Equation (26).
In multiple stratified sampling, if the total sample size relative to the total number of sublayers MN is not large enough, it may appear that some sublayers are not assigned to the sample.This situation is likely to occur in practice, and in order to improve the precision of the result, the sample can be appropriately increased [10].Next, we will illustrate the specific application process of multiple stratified sampling in the enterprise safety culture analysis program based on educational background.

Application Examples
The sample size of coal mine company A wants to be determined.It is known that A enterprise is a coal mine and it is a subordinate company of a large coal mine group.In the database of the enterprise safety culture online measurement system already contains the original measurement data of company B. Enterprise B is also a coal mine enterprise, and it belongs to the subordinate company of the large coal mine group with the enterprise A. Therefore, this article will determine the sample size of company A through the existing measurement data of existing company B. First, the variance of enterprise B is calculated.
There are 2218 employees in company A, including the management, professionals, foremen, and front-line employees.At the same time, each category also includes people with different educational background.The proportion is shown in Table 1.
Table1 Level of academic qualifications of personnel at all levels.According to the multiple stratified sampling formula Wkl=Wk.Wl, the layer weights of each sublayer are calculated.There are a total of 4*5 layers, and the layer weights of each sublayer are shown in Table 2.According to the formula of design effect, in order to improve the precision, the deff is taken as 1.Therefore, according to the formula of design effect, the sample size of stratified sampling is shown in Equation (31).

Category
According to the layer weights of sublayers in Table 2, the number of sublayers is calculated by combining the sublayer number formula nkl=n.Wkl.According to Table 3, a total of 220 samples need to be drawn.Here, the actual number of samples taken is two more than the number of samples calculated in the previous step.The reason is that in the process of calculating the number of sublayers, the reduction and integration are carried out, and the overall impact is negligible.In different categories of personnel, the management, professionals, foremen, and front-line staves take samples of 24, 52, 54, and 90, respectively.In different educational backgrounds, 12, 62, 65, 48 and 33 samples are collected from junior high school below, junior high school, high school, college and university and above.The number of cross-layers is shown in Table 6.5.For example, there are 10 high school or college students who need to be sampled by professionals, and 8 junior high school people need to be sampled from first-line employees.In this way, the sampling number of each category of educational background in each category can be determined.In subsequent analysis, it can not only be compared with the original personnel categories, but also be analyzed according to the educational background.
In the original sampling method, 8 units are pre-investigated by simple random sampling in each layer by the pre-investigation method, and the 32 samples are used to determine the sampling ratio and variance, and the total sample size is further determined.
According to the sampling ratio formula N n f = , the sampling ratios of all personnel are calculated as follows: 03448 According to the formula It can be seen from the data comparison that through the pre-survey the variance of foremen and front-line staves are 76.45 and 87.43, respectively, while the variance of foremen and front-line staves are 109.45and 125.35, which has a certain gap.So these data will be used in the follow-up process, then the sample size is further estimated.While the overall variance estimated by company B is 93.22, and the actual variance is 85.82, which is relatively close.Therefore, it is more accurate to estimate the total variance of similar enterprises and estimate the variance through the way of pre-survey.At the same time, in the original analysis results of the enterprise safety culture online analysis system, only the number of personnel at each level is determined, but the number of different educational background is not clear.The number of people identified in this article with different educational backgrounds are 12, 62, 65, 48 and 33.The other advantage of this sampling method is the consideration of the educational background, so the sampling method adopted is more advantageous than the original Neyman allocation method in the estimation of variance and the investigation factors.

Conclusion
In summary, the following conclusions are drawn.
(1) When stratified sampling is used for quantitative measurement of safety culture, there may be some problems such as single investigation factor, difficult determination of the number people and deviations in estimates.
(2) The process of applying multi-layer stratified sampling in the design of quantitative questionnaires for safety culture is elaborated.
(3) Multi-layer stratified sampling is carried out for a large coal mine enterprise, and it is found that this sampling method is more advantageous than the original Neyman allocation method in estimating variance and investigating factors.

2
And through the calculation of the measured data, it can be obtained:

Table 2
Layer weight of each sublayer.

Table 3
Specific number of sublayers.