Education, Science, Technology, Innovation and Life
Open Access
Sign In

Automatic Mining Method for Heterogeneity Features of Prose-Chinese Translation Corpus Based on Artificial Intelligence

Download as PDF

DOI: 10.23977/infkm.2023.040105 | Downloads: 5 | Views: 318


Yanhua Ma 1


1 Zhejiang Yuexiu University, Shaoxing, Zhejiang, 312000, China

Corresponding Author

Yanhua Ma


With the diversification of culture and the universality of language, prose as an important literary material has also attracted more scholars' attention. At the same time, due to the current integration and development of science and technology and culture, corpus, as a large-scale electronic text library, is of great significance to the study of relevant language theories. However, after studying the heterogeneity characteristics of prose Chinese translation corpus, it was found that there were still some problems in the current automatic mining methods of heterogeneity characteristics. In order to solve this problem, this paper proposed a new method based on artificial intelligence (AI) to automatically mine the heterogeneity features of prose Chinese translation corpus. In order to verify the effectiveness of this method, this paper also conducted an empirical study. The research results showed that the method in this paper could increase the weight coefficients of heterogeneity features from dataset 1 to dataset 6 in the corpus by 57, 34, 28, 36, 16, 13 respectively, and effectively reduce the offset of dataset nodes and increase the mining amount of data node access, thus improving the effectiveness and practicability of the automatic mining method. In addition, the research of prose meaning corpus could also enrich the research content of corpus, and broaden the research scope of corpus, so as to promote its better development.


Chinese Prose Translation Corpus, Artificial Intelligence, Heterogeneity Features, Automatic Mining


Yanhua Ma, Automatic Mining Method for Heterogeneity Features of Prose-Chinese Translation Corpus Based on Artificial Intelligence. Information and Knowledge Management (2023) Vol. 4: 32-44. DOI:


[1] Lee Thomas R., and Stephen C. Mouritsen. (2021) "The corpus and the critics." The University of Chicago Law Review 88.2: 275-366.
[2] Feng Haoda, Ineke Crezee, and Lynn Grant. (2018) "Form and meaning in collocations: a corpus-driven study on translation universals in Chinese-to-English business translation." Perspectives 26.5: 677-690.
[3] Luo Jinru, and Dechao Li. (2022) "Universals in machine translation? A corpus-based study of Chinese-English translations by WeChat Translate." International Journal of Corpus Linguistics 27.1: 31-58.
[4] Davies Mark. (2021) "The coronavirus corpus: Design, construction, and use." International journal of corpus linguistics 26.4: 583-598.
[5] Li Yawen. (2021) "Heterogeneous latent topic discovery for semantic text mining." IEEE Transactions on Knowledge and Data Engineering 35.1: 533-544.
[6] Prabu P., R. Sivakumar, and B. Ramamurthy. (2021) "Corpus based sentimenal movie review analysis using auto encoder convolutional neural network." Journal of Discrete Mathematical Sciences and Cryptography 24.8: 2323-2339.
[7] Yang Lu, and Averil Coxhead. (2022) "A corpus-based study of vocabulary in the new concept English textbook series." RELC Journal 53.3: 597-611.
[8] Bryan Christopher J., Elizabeth Tipton, and David S. Yeager. (2021) "Behavioural science is unlikely to change the world without a heterogeneity revolution." Nature human behaviour 5.8: 980-989.
[9] Zaki Rezgar. (2022) "Observed and unobserved heterogeneity in failure data analysis." Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 236.1: 194-207.
[10] De Sutter, Gert and Marie-Aude Lefer. (2020) "On the need for a new research agenda for corpus-based translation studies: A multi-methodological, multifactorial and interdisciplinary approach." Perspectives 28.1: 1-23.
[11] Rebechi Rozane, and Stella Tagnin. (2020) "Brazilian cultural markers in translation: A model for a corpus-based glossary." Research in Corpus Linguistics 8.1: 65-85.
[12] Park Chanjun, and Heuiseok Lim. (2020) "A study on the performance improvement of machine translation using public Korean-English parallel corpus." Journal of Digital Convergence 18.6: 271-277.
[13] Sayogie Frans, and Moh Supardi. (2021) "Equivalence Levels of Literary Corpus Translation Using a Freeware Analysis Toolkit." Buletin Al-Turas 27.1: 55-70.
[14] Vosiljonov Azizbek. (2022) "Basic theoretical principles of corpus linguistics." Academicia Globe: Inderscience Research 3.2: 1-3.
[15] Keding Christoph. (2021) "Understanding the interplay of artificial intelligence and strategic management: four decades of research in review." Management Review Quarterly 71.1: 91-134.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.