A Corpus-Based Study on the Use of High-Frequency Prepositions between Chinese English Learners and Native Speakers

: This paper uses corpus-based interlanguage comparative analysis to investigate the similarities and differences in frequency and collocations of ten high-frequency prepositions in native English speakers' corpus and Chinese learners' corpus. The results show that the overall frequency of these ten prepositions in each corpus is on the relative level, but there are phenomena of overuse and underuse when it comes to the use of every single preposition. In addition, it is found that Chinese English learners and native speakers have great differences in the use of prepositional collocations, which is caused by various factors.


Introduction
In recent years, prepositions usage of EFL learners has been investigated by many researchers.Prepositions, which can be seen as the most common word in English, have not been acquired to a good extent by Chinese English learners and there is still a large gap between their use of prepositions and that of native speakers.Some scholars consider the acquisition difficulty is due to the rich semantic meaning, strong grammatical function and variable use of prepositions (Gui Shichun&Yang Huizhong, 2003) [1] .So far, there are many ways has been used to analyze the usage of prepositions by EFL learners.Among these ways, the emergence of the corpus has created favorable conditions for the study of prepositions especially.By using the corpus to analyze prepositions, it is convenient to obtain a large amount of information about the usage of prepositions, such as their frequency of use, common collocations and so on.And some scholars have used corpora to conduct related research about prepositions, but most of them at home and abroad focus on the analysis of usage errors of prepositions through comparing a particular one, while research on the specific discrepancy in the usage frequency and collocation of prepositions between English learners and native speakers is relatively less.
Accordingly, this article will firstly use quantitative analysis to focus on the specific discrepancy in the frequency of prepositions use between Chinese English learners at different levels and native speakers.Secondly, a qualitative discussion will be used to analyze the left-right collocations of the preposition with the greatest usage difference.It is expected that through comparing and analyzing the discrepancies in the frequency and collocations of prepositions among the three corpora, as well as the reasons for these differences, some suggestions can be given for authentic preposition teaching.

Contrastive Interlanguage Analysis
Contrastive Interlanguage Analysis (hereafter referred to as CIA) is "Contrast and comparison between language use of native speakers and non-native speakers in comparable circumstances" (Pery Woodley, 1990) [2] .CIA was first proposed by Granger (1996), this method mainly compares the learners' corpus with native-speakers' language corpus to find out the differences between them and then make analysis.CIA mainly involves two types of comparison.One is to compare the language of learners and native speakers to describe the characteristics of the learners' interlanguages such as overuse, underuse and misuse, to highlight the differences between learners and native speakers.The other comparison of CIA is between learner corpora of different countries or different levels [3] .This study is a corpus-based contrastive interlanguage analysis, in which we not only compare Chinese learners' interlanguage with native speakers, but also compare the interlanguage at different proficiency levels with native speakers to find out more information of Chinese English learners' preposition usage.

Corpus-based Studies on the Use of Prepositions Abroad
Using learner corpus to investigate and link the relation between corpus linguistic and second language research appeared until the late 1980s.Learners' corpus can provide useful non-native speakers' second language using data to researchers, no matter the usage illustrates from data is right or wrong, it represents the language characteristics of some specific second language learners.An increasing number of researchers have recently focused on investigating prepositions from non-native speakers' corpus.
Mindt and Weber (2010) used LOB and Brown corpus to compare the distribution of prepositions in American and British English and searched for 14 high-frequency English prepositions commonly used by native English speakers [4] .They also found that the distribution of the six most commonly used prepositions (representing more than 70% of all the occurrences of prepositions in both corpora) is almost identical in American and British English.
Ang and Tan (2016) compared Malaysian English learners 'prepositions using in writing texts with the usage of British native speakers [5] .This study was based on three corpora including BNC, subcorpus of EMAS and sub-corpus of LOCNESS to compare the English preposition usage between the non-native speaker and native speaker.There are also some researchers who investigated the usage of several specific prepositions.For instance,Arjan A, Abdullah N H and Roslim N(2013) used the Malaysian Corpus of Students Argumentative Writing (MCSAW) to examine the usage, mastery and developmental pattern of English prepositions of place, in and on across three different academic levels.And the finding showed that the College students managed to show a positive development in the use of prepositions of place, in and on.The findings also revealed that students are confused between in and on as well as using them with or without articles correctly [6] .
In addition, Tran Tin Nghi and Tran Huu Phuc (2022) examined the frequencies of English preposition usages from the perspective of conceptual transfer [7] .They found a negative relationship between prepositional senses and their collocations with certain Vietnamese linguistic features and revealed that negative conceptual transfer was recurrent and systematic.
Except for researching the usage of prepositions, the research about factors that influence students' usage of prepositions have also been done by some researchers.Nghi Tran Tin, Thang Nguyen Tat and Phuc Tran Huu (2020) did a survey research to investigate the factors affecting the uses of English prepositions made by Vietnamese learners of English.The results showed that Vietnamese intralingual interference strongly affected prepositional sense expressed by Vietnamese EFL learners [8] .Genders, level of learning, writing and speaking, and cognitive embodiment also played a significant role in terms of language transfer, affecting the usage of English prepositions by EFL learners.

Corpus-based Studies on the Use of Prepositions at Home
The development of corpus linguistics has added new perspectives to the study of prepositions.In China, the study and research of Corpus started from the 1990s, which has been improved in a quite fast speed.Based on different corpora, many researchers focus on analyzing the using features of EFL learners and comparing the differences between Chinese EFL learners and native speakers.
Gui Shichun and Yang Huizhong (2003) published Chinese Learner English Corpus to illustrate English learning features of a various range of Chinese EFL learners in 2003, including high school students, non-English major university students and English major university students [1] .Since the CLEC corpus has been published and free to accessed, an increasing number of researchers started to study the language learning features of Chinese learners based on this corpus.
Wang Ying (2007, 2009) used Brown corpus, Lob corpus and CLEC corpus to research and compare the similarities and differences of 15 commonly used prepositions in these three corpora.She found that Chinese learners overused some prepositions, like to and about, and underused others, such as of, as, with, by [9][10] .
Using the ICCI corpus as a research tool, Zhang Huiping and Liu Yongbing (2013) investigated the impact of preposition learning on language transfer among Chinese English learners.They presented many examples of English prepositions and Chinese prepositions to identify the reasons why Chinese English learners often misuse prepositions in the process of English learning [11] .
Fangqiang, Wang Yina and Li Yinmei(2022) concentrated on the use of prepositions in English titles of domestic and foreign linguistic journals.The study found that the overall frequency of prepositions in the titles of Chinese journals was higher than that of international journals [12] .Meanwhile, the semantic distribution of high-frequency prepositions and the number of prepositional collocations in the titles of the two types of journals differed significantly.
Although there are many related researches, most of them take college students' English compositions as corpus and tend to analyze single prepositions.There is a lack of systematic investigation on the distribution of prepositions use, and there are relatively few studies comparing the use of prepositions between Chinese students of different levels and native speakers.In view of this, it is necessary to systematically analyze the differences of preposition frequency and collocation among Chinese students at different levels and native speakers.

Research Questions
The article aims to acquire the specific discrepancy of preposition use between Chinese English learners at different levels and native speakers through analyzing authentic material from COCA (The Corpus of Contemporary American English), ST6 sub-corpus of CLEC (Chinese Learner English Corpus) and the self-compiled corpus collecting writing samples from 12th-grade EFL learners at high schools.Thus, this research attempts to answer the following questions: (1) What are the similarities and differences in the distribution of the top ten high-frequency prepositions in the three corpora?Do Chinese learners overuse or underuse some prepositions?
(2) What are the similarities and differences in the collocations of the preposition with the greatest frequency difference between Chinese English learners and native speakers in the three corpora?

Corpora
Three corpora will be used in this study, including COCA (The Corpus of Contemporary American English), ST6 sub-corpus of CLEC (Chinese Learner English Corpus) and a self-compiled corpus collecting writing samples from 12th-grade EFL learners at high schools (hereafter referred to as mini corpus).First, for the mini corpus, there are 200 writing samples written by 12th-grade EFL students from several high schools in Sichuan Province in total and the topic of these compositions is about the world and the environment.Students are told to write a speech about 100 words to stress the urgency of environmental protection, discuss how to carry out the movement of environmental protection and appeal to everyone to actively participate in it.Secondly, The Chinese Learner English Corpus is the first English corpus of Chinese learners in China.It is made up of five sub-corpora including ST2, ST3, ST4, ST5, ST6, which respectively represent texts collected from high school students, non-English major students at different levels and English major students at different levels [1] .In this article, the ST6 sub-corpus of senior English major learners is chosen to make a comparison with the mini corpus, each representing the high and low levels of Chinese English learners.Lastly, the reference corpus used in this study is The Corpus of Contemporary American English.COCA is the only large, genre-balanced corpus of American English.The corpus contains one billion words of text from eight genres and it is probably the most widely-used corpus of English.The texts in this corpus represent English used by native speakers in different genres, which can demonstrate the features of how native speakers use English.

AntConc
This study employs AntConc 3.5.9as the retrieval program.The software is a cross-corpus processing software made by Laurence Anthony, a professor at Waseda University in Japan.It includes 7 major tools, such as Concordance, Collocate, Word List, Keyword List, etc.This study will employ AntConc's Word List function to extract the top ten high-frequency prepositions from the mini corpus and ST6 sub-corpus of CLEC respectively.And then the frequency of top ten highfrequency prepositions in the COCA will be searched with its own tool online.Afterwards, this study will select the preposition with the most significant use discrepancy between Chinese English learners and native speakers and use the Collocate function of AntConc to obtain the left-right collocations of the word in each corpus to make a discussion.

Research Procedures
This study will mainly employ contrastive interlanguage analysis method to analyze authentic language material from the three corpora.Specific research steps are as follows: Firstly, 200 English essays on the theme of environmental protection are collected from Chinese 12 th -grade students, each of which is from one student.And then type students' written texts into the computer to form a file of TXT version and name it as mini corpus.
Secondly, the online retrieval tool of COCA will be used to obtain the occurrence frequency of the top ten high-frequency prepositions in COCA, and then the frequency of these ten prepositions used in CLEC's ST6 sub-corpus and the mini corpus will be retrieved to make comparisons between each other by using the Word List function of AntConc3.5.9.In addition, as for total word tokens in these three corpora is different, the operation of frequency normalization is required.In this study, the frequency of each thousand words is selected as the criterion for detailed discussion about the similarities and differences in the distribution of the top ten frequently used English prepositions in these three corpora.
Finally, in order to learn about the differences between native English speakers and Chinese English learners of different levels when choosing preposition collocations better, this study will take the most representative preposition to as an example to search the left and right collocations of it in the three corpora by using Collocate function of AntConc3.5.9.When retrieving, the word span of left and right is first set as one word to determine which words appeared most.After that, rank the frequency of these collocation words in order and select the top five typical left collocation words and right collocation words with the most frequent occurrence in the three corpora to make analysis.What's more, this study will discuss the reasons for the differences of high-frequency prepositions use between Chinese English learners and Native speakers through looking through relevant literatures.

The Frequency of English Prepositions in Three Corpora
For learning the usage difference of English prepositions between Chinese English learners and native speakers, as well as that between Chinese English learners at different levels, this research will first obtain the top ten frequently used English prepositions in COCA, and then the frequency of 10 most frequent English prepositions of COCA used in CLEC's ST6 sub-corpus and the mini corpus will be retrieved to make comparisons between each other.

The Frequency of English Prepositions in COCA
As for total word tokens in these three corpora is different, the number of total word tokens in the mini corpus is 24801, while there are 242929-word tokens in total in CLEC's ST6 sub-corpus and COCA's number of total word tokens is one billion.It's impossible to compare them directly by the number of occurrences of prepositions, therefore, the operation of frequency normalization is required.In this research, the frequency of each thousand words is used as the criterion, namely, FREQ presented in the following tables means the frequencies of the word in corpus and FREQ/1000 is the frequency of per thousand words (=Freq/Total word tokens*1000).1 above shows that the top 10 frequently used English prepositions of native speakers in COCA are of, in, to, for, with, on, at, from, by, about.Among all these prepositions, of and in occupy the first and second positions in the frequency table respectively, with the frequency of per thousand words at 23.25 and 15.67.Followed the above two prepositions, to and for are also frequently used prepositions, which constitute 9.23 and 8.20 of per thousand words.The other six prepositions ranking behind show a relatively small percentage in the corpus, which is under the frequency of 7 per thousand words, revealing that with, on, at, from, by and about are not frequently used ones, and about have the lowest frequency, occupying only 2.43 per thousand words.

The Frequency of English Prepositions in Mini Corpus
Secondly, AntConc3.5.9 software is used to find the top 10 high-frequency prepositions used by native English speakers of COCA in the mini-corpus.The table below shows the overall frequencies of these 10 prepositions as well as the number of occurrences per 1,000 words.  2 illustrates the frequency of 10 most frequent English prepositions of COCA used in mini corpus, it's very evident that to is the most frequently used preposition by Chinese senior three students, with a frequency of 40.64 per thousand words.In addition, with a total count of 455 and a frequency of 18.35 per thousand words, of is likewise a common preposition employed by Chinese senior three students.After of, in also makes up a relatively large percent, accounting for 14.5 per thousand words.And for is the last preposition with a frequency of more than 10 per thousand words.Except for the prepositions to, of, in, and for, all other prepositions are used at a frequency of less than 10 per 1000 words by Chinese seniors.For example, on and with are used at a frequency between 6-5 per 1,000 words, while about, from, by and at are used at a relatively low frequency of less than 5 per 1000 words by Chinese seniors.After observing the frequency of English prepositions in the above two corpora, the usage feature of prepositions in ST6 sub-corpus of CLEC is also needed to be analyzed.The table below illustrates the frequency of English prepositions 10 most frequent English prepositions of COCA used in ST6 sub-corpus of CLEC.

The Frequency of English Prepositions in ST6 sub-corpus of CLEC
According to table 3, it's easy to find that to is the highest-frequency preposition used by Chinese English major seniors, with a total count of 7256 and a frequency of 29.87 per thousand words.And of and in also rank the second and third place of the usage frequency, accounting for 26.97 and 19.83 per thousand words respectively.Except for to, of and in, frequencies of all the rest prepositions are below 10 per 1000 words.This shows that Chinese English major seniors tend to use the prepositions to, of and in, while less use from, at and about.

Comparative Analysis of High-frequency English Prepositions in Three Corpora
The analysis of high-frequency prepositions in the three corpora indicates the overall situation and differences in the use of prepositions by Chinese learners of various levels and native English speakers, as well as the overall distribution features of prepositions in the three corpora.The frequencies and the frequency of per thousand words of the 10 most frequent prepositions used by Chinese learners and native English speakers are shown in the table below.From the table 4 above, it can be found that the overall frequency of these ten prepositions in each corpus is on the relative level.All of the 4 most frequently-used prepositions in three corpora are of, in, to and for.It shows the preference for preposition use between native speakers and EFL learners.However, when it comes to every single preposition, the distribution of them in different corpora varies a lot.As the size of the three corpora is different, the different frequencies cannot be treated as a standard to distinguish the overuse or underuse of Chinese English learners, therefore, we contrast the usage through the frequency of per thousand words.Furthermore, the similarities and contrasts in the use of prepositions in the three corpora can be better visualized using a line graph as Figure 1.
From the graph below, we can draw the following findings: 1) The change curve of the mini corpus is similar to that of ST6 sub-corpus of CLEC, while it differs slightly from that of COCA.When compared with English native speakers, it can be noticed that some prepositions have always been overused, such as to and for, while some have been underused, such as with, on, and at.
2) Compared with the mini corpus, the preposition employment of ST6 sub-corpus of CLEC is closer to COCA, demonstrating that as Chinese learners' English levels grow, their use of prepositions gradually resembles that of native English speakers.
3) With the development of language learning, several problems in the usage of prepositions in senior high school students' English learning still exist among Senior English majors.For example, they all overuse to to some extent.However, there are some differences in the way some prepositions are used.For instance, 12th-grade EFL students tend to underuse of and in, whereas senior Englishmajor students overuse these two prepositions.
In addition, it can be seen from the above graph that to is the most frequently used preposition in both the mini corpus and ST6 sub-corpus of CLEC, which is the preposition most overused by Chinese English learners.Therefore, the author will take to as an example.Through analyzing the collocation of to in the three corpora to get the reasons for the differences of preposition usage between Chinese English learners and native speakers.

Collocation Analysis of Preposition-to
In the previous chapter, we have learned that Chinese learners tend to overuse and underuse some prepositions during different learning stages.Now we turn to a detailed study of the most representative preposition to, which is the preposition always overused by Chinese English learners.In order to study the collocation differences between Chinese English learners and native speakers, as well as Chinese English learners of different levels in using the word, the author searched the collocations of to in the three corpora respectively to analyze the use characteristics of the word, combining with the similarities and differences of its collocations.The frequencies of left collocations of to in the three corpora are calculated using the Collocates function of AntConc 3.5.9and the COCA online corpus.The top five left collocates of to as well as their frequencies are shown in Table 5.

Left Collocates Analysis of to
The Table 5 above shows five most frequently used left collocation of to in the three corpora, and it can be seen that the left collocation of preposition to differs significantly between Chinese English learners and native speakers.In general, the left collocations used often by both Chinese English learners and native speakers are "want to" and "have to".However, the left collocation "going to", which is most frequently used by native English speakers in COCA, is underused by Chinese learners of English.In addition, the words "glad to", "how to", "right to", which are commonly used by Chinese English learners, seem to be rarely used by native English speakers.However, there is still some similarity among the high-frequency left collocations of to in the three corpora, that is, verbs occupy the dominant position.Furthermore, the analysis of the table also shows that English major learners employ left collocations of to more consistently than senior three students.

Right Collocates Analysis of to
Based on the frequencies of right collocations in the three corpora, the top five right collocations with the highest frequencies were obtained, as shown in table 6.As can be seen from table6, there are still some differences in the right collocations of the preposition to between Chinese English learners and native speakers.The right collocations that both Chinese English learners and native speakers often use are "to be" and "to do".And among the top five high-frequency right collocations, most of them are verbs except for "to the".This means that the preposition to is usually combined with a verb to make up an infinitive structure of verb.In addition, the prepositions 'to take', 'to die' and 'to make', which are rarely used by native English speakers, are the common collocations overused by Chinese English learners.It is worth noting that the top right collocation word in the mini-corpus is "protect", which is supposed to be influenced by the theme of the essay as the theme of the mini-corpus material is "protect the environment", so students use the phrase "to protect the environment" a lot.

Conclusion
Through comparative study of 10 common high-frequency prepositions between COCA, CLEC and the self-compiled mini corpus, this study found that the overall frequency of these ten prepositions in each corpus is on the relative level.All of the 4 most frequently-used prepositions in three corpora are of, in, to and for.It shows the preference for preposition use between native speakers and EFL learners.However, when it comes to every single preposition, the distribution of them in different corpora varies a lot.For example, some prepositions have always been overused by Chinese English learners, such as to and for, while some have been underused, such as with, on, at, etc.And this is caused by various factors, and one of them is the influence of first language transfer.For example, the preposition by in English represents passive voice, but there is no corresponding similar structure in Chinese, which leads to the lack of mastery of this word by Chinese learners.However, by comparing the use of prepositions in the writing materials of English majors and senior three students, it is found that with the deepening of learning, Chinese English learners' use of prepositions gradually resembles that of native English speakers.In addition, the study found that Chinese English learners and native speakers have great differences in the use of prepositional collocations, which may be caused by the gap in the topic content of composition materials, or teachers do not pay enough attention to the diversity of prepositional collocations in teaching.Accordingly, teachers can determine the key and difficult points in preposition teaching and use corpus to assist preposition teaching according to the differences found between Chinese English learners and native speakers in the use of prepositions, so that students can better master the use of prepositions, increase the perception of prepositions and learn more idiomatic collocations.

Table 1 :
The top 10 frequently used English prepositions in COCA

Table 2 :
The frequency of 10 most frequent English prepositions of COCA used in mini corpus

Table 3 :
The frequency of 10 most frequent English prepositions of COCA used in ST6 sub-corpus of CLEC

Table 4 :
The frequency of 10 most frequent English prepositions in the three corpora Figure 1: The 10 most frequently used prepositions in the three corpora

Table 5 :
High-frequent left collocates of to

Table 6 :
High-frequent right collocates of to