Research on urban road travel speed extraction based on mobile phone signaling data

: Matching mobile phone signaling switching data with road and extraction of road travel speed is the foundation of obtaining traffic information by using mobile phone signaling data, however, the existing matching method extracting a feature sequence has a large error, and results in a lot of waste of road test mobile phone signaling switching data. Using the location feature of mobile signaling switching place, put forward an urban road calibration and matching method based on the longest common subsequence dynamic planning. By pre-processing of mobile phone signaling data measured on the road, calculate the correlation threshold of urban road calibration sequence based on the longest common subsequence; by dynamic programming method, calculate the degree of association of the sequence that has not been matched and the road test sequence set and compare it with correlation threshold, and match cellular signaling data with urban road, then calculate the travel speed of the successful matched sequence. The result of example analysis shows that the result of the road matching and travel speed calculation is less error.


Introduction
With the acceleration of urbanization construction and the development of the economic level in China, traditional questionnaires, manual surveys and other research means it cannot meet the demands of traffic study.In recent years, determining the travel route by mobile location technology to estimate the road traffic status has been valued gradually in the field of traffic study.The cellphone data has many advantages, such as low cost, huge data volume and ecological economics, etc, and it can be used to track the complete travel course of users, which is very beneficial to improve the accuracy of the estimation of road traffic status.At the time of gaining the data of the road traffic status by the cellphone data, matching correctly the data generated by the on-board mobile phone with the urban road and extracting the travel speed is a key problem to be solved.

Literature Review
In some case, the originating COO location technology is applied more often.For example, we can identify the travel law of the user through the analysis of the mobile user's stop spot [1], or extract the location of the base station where mobile users stay for extraction and analysis of the trip OD data by the COO information provided by the GSM cellular system [2,3].However, COO location can only be accurate to within the service radius of the base station, while the service radius of the base station vary from tens of meters to several hundred meters The lower the coverage density of the base station is, the greater the service radius will be, which means COO positioning results are more inaccurate.Therefore, the discriminant accuracy of COO positioning technology for the specific path is quite low, and it hardly determines the true trip path of mobile users in staggered urban roads with complex base station distribution.
In other case, the signaling switching location technology used commonly, that is, position the switch place of the mobile user through the change of location area code (LAC) in the cellphone signaling data and cell identity (CellID) of the base station when mobile users pass through service ranges of two different base stations.Tang Yuhao [4] describes extraction, cleaning, loading, secondary-processing method of the cellphone signaling switch data, which gets trip OD matrix, the expected routines of residents when traveling and other core data.Mao Xiaowen [5] describes the preprocessing process of the cellphone signaling data and does some research into the regional travel characteristics of residents using the cellphone signaling data.Yang Fei [6,7] calibrates roads by a continuous switching sequence with the heaviest weighting as a feature sequence of a road gained by superimposing switch sequences of all the tests, and regards it as the road matching basis of the sample to be tested.But the switch sequence with the maximum weight cannot represent the real signaling switches on the road, nor can it reflect the real traffic state on the road.A large number of signaling switches are excluded from the road calibrations, resulting in a waste of test data.It can be seen that the signaling switch positioning technology is more accurate than COO positioning technology, but the data processing and the path matching process are also more complex, there is no recognized method.

Methodology
Whether the user or the time of switch is coincided or not, the location where the same cellphone signaling switch is always located in the same interweaved region of two base station service scopes.By testing the switching sequence that occurs on a certain road and calculating the correlation degree of the signaling switches of users to be tested, we can determine whether or not the signaling data of the users matches the road.
To match the signaling switch data generated by the mobile phone users with the urban road, a lot of road tests are needed to extract "the signaling switch label" of the road.For the signaling data to be calibrated, it is necessary to be integrated into an orderly switch sequence of cellphone signaling, and then, compare data with "the signaling switch label" of the road to judge whether the match succeeds through matching algorithm.For one signaling switch data matched successfully, we can calculate the average travel speed of the user according to its starting and ending time and the matched road length.The algorithm flow is shown in Figure 1.

GSM (G area, base to a regular
The cell from the b communic The original cellphone signaling switching data mainly includes MSID (the only user ID with anonymous encryption), TIME (timestamp), LAC (location area code), CELLID (cell identity), EVENT (time types: text messages, MOC-MTC, surf the Internet, location update, cell switch) other fields.When the signaling switch occurs, due to the mobile phone passes through different location areas, cell site, LAC code and CELLID change, we can locate mobile phone users approximately at that time by positioning the switch area as known as the fusiform region near the edge of "honeycomb" units.

Basic idea of the algorithm
Assuming that there is a unknown test sequence ， ， ⋯ and a signaling switch label sequence ， ， ⋯ obtained on a known road, either of which is a particular switch.We can figure out the matching result by extracting the common part between the test sequence S and the signaling switch label sequence T of this road, and calculating the ratio between the common part and the sequence length.
Assuming C is not only a subsequence of S, but a subsequence of T, then C is a common subsequence of S and T. If C is the longest among common subsequence of S and T, C is the longest common subsequence (LCS) of S and T [9].Obviously, the simplest way to compare the relevance of two sequences is to compare the length of the longest common subsequence.
Once the longest common subsequence of the two switching sequences is obtained, the relevance of the two switching sequences can be estimated easily.We can calculate the LCS of the switching sequences (sequence S and sequence T) and compare with the relevance thresholds.While sequences are matched, the travel speed can be easily got based on the length of the urban road and the start and end time of the switch.

Data preprocessing
When performing road calibration and match, the valid fields in the signaling data include LAC (Location Area Code), CELLID (cell indentity), and TIME (Time Stamp) detection occurrence time.On the one hand, the mobile phone not only reports the position information to the network but also records the signaling data when crossing the cell site, producing redundancy record at the same time.On the other hand, the ping pong effect is easy to happen due to the switch disturbance caused by the unstable signal, which means the mobile phone signal switches back and forth among several base stations.These phenomena not only consume storing resources, but dramatically reduce the matching accuracy.Therefore, we preprocess the signaling data in the following ways to extract a simple and effective switching sequence: 1) Comparing LAC of the two piece of adjacent data and the cell identity CELLID, when some of them change, the program will affirm that a switch happens {(Lac CellID )，(Lac CellID )}.
2) After Step 1 finished, retrieve the previous switch.If the sequence of switch has the shape of {(Lac CellID )，(Lac CellID )，(Lac CellID )}, the time interval of switch will be calculated.The program will only keep {(Lac CellID )，(Lac CellID )} when the time interval is too short.

The relevance calculation of the switching sequence
With the dynamic programming method, we calculate the longest common subsequence of the test sequence S ， ， ⋯ and the calibration sequence T ， ， ⋯ gained by the road test.Specific steps of relevance calculation are shown as follows: 2) The l the lengths ρ is introdu ρ

Calcu
The   Real field testing is required.The authors designed a real life experiment, which test sample is shown in Table 1, and the diagram of measured data is shown in Fig. 5.The threshold of the weighted relevance judgment results in this experiment is 2.5.Given an extra user test Sequence A, we can obtain the maximum relevance between each sequence sample and Sequence A, then calculate the weighted relevance between Sequence A and all the sequence samples which is 2.7414 through the algorithm discussed above.The weighted relevance is greater than the threshold, so it can be sure that the user's signaling data matches with the road successfully.Getting the time when the mobile phone signaling is switched to the base station near the road and the time when it leaves the base station near the road，we can calculate the time t= -t , when the vehicle is running on the road.The length of the road section is gained from the database.In this experi speed is u Which i

Conclusions
The Fig. 1positioning t , t of the sequence can be obtained.With the given total length of the road, the average travel speed of the user is: v

Table 1
The test sample of mobile phone signaling data in Taiping North Road

Table 2
The result of path matching and travel speed gathering