Education, Science, Technology, Innovation and Life
Open Access
Sign In

Research on the Construction of Resource Database of the Yi Dialects for Information Processing in China

Download as PDF

DOI: 10.23977/jsoce.2021.030404 | Downloads: 5 | Views: 889

Author(s)

Chengping Wang 1, Qingya Zeng 1, Dongyan Sun 2

Affiliation(s)

1 Minzu Languages Information Processing Lab (Provincial Key University Lab of Sichuan Province of China), Southwest Minzu University, Chengdu, Sichuan, 610041, China
2 Chengdu Polytechnic, Chengdu, Sichuan, 610041, China

Corresponding Author

Chengping Wang

ABSTRACT

At present, the research on the basic language project of the Yi language is still in the primary stage, so it is not easy to describe and show the real features of the Yi dialects. How to establish a corpus of Yi dialects with the help of computer information, corpus, artificial intelligence, and other modern information processing technologies, indeed record the appearance and situation of Yi dialects, and protect Yi language cultural heritage with social and historical value has become a critical problem to be solved in Yi language and related research fields. This paper takes the six major Yi dialects as the main line of research, combined with the characteristics and application scope and population size of different Yi dialects, to determine the language survey analysis and data collection points of each dialect, subdialect, and local language point. On this basis, carry out the research and construction of the Yi dialects resource database from multiple levels and dimensions such as words, sentences, dialogues, and texts, and create a high-quality information sharing platform of Yi dialects corpus. Moreover, combined with the author's practical experience in the research and development of Yi language information processing technology, this paper analyzes and considers some related problems in the construction and application of Yi dialects corpus.

KEYWORDS

Yi dialects, Corpus, Tagging, Sharing platform, Resource database

CITE THIS PAPER

Chengping Wang, Qingya Zeng, Dongyan Sun. Research on the Construction of Resource Database of the Yi Dialects for Information Processing in China. Journal of Sociology and Ethnology (2021) 3: 17-27. DOI: http://dx.doi.org/10.23977/jsoce.2021.030404.

REFERENCES

[1] Shamalayi. Development and Prospect of Yi language information processing technology in the past 30 years [J]. Journal of Chinese Information Processing, 2011. (6): 170-174.
[2] Shiwen Yu. Construction and utilization of comprehensive language knowledge base [J]. Journal of Chinese Information Processing, 2004. (5): 1-10.
[3] Chengping Wang. Construction of Yi, Chinese, and English Parallel Corpora for information processing and Research on corpus alignment technology [J]. Bulletin of Science and Technology, 2012 (1): 131-134.
[4] Congjun Zhou. XML programming [M]. Tianjin, Tianjin University Press, 2010:9-12.
[5] Baijing Hu. Management practice of SQL Server 2008 [M]. Beijing, Posts and Telecommunications Press, 2009:36-48.
[6] Xinyu Cao, Cungen Cao. A method for obtaining partial whole relational corpus from the web [J]. Journal of Chinese Information Processing, 2011. (5): 17-23.
[7] Zheng Lin, Yajuan Lv, Qun Liu, Xiong Ma. Web parallel corpus mining and its application in machine translation [J], Journal of Chinese Information Processing, 2010. (5): 85-91.
[8] Baobao Chang, Weidong Zhan, Huarui Zhang. Construction and management of bilingual corpus for Chinese English machine translation [J]. Computer-aided Terminology Research, 2003, (1): 28-31.
[9] Kangxi Li, Yong Yang. Linguistic thinking on Parallel Corpus alignment [J]. Journal of Hefei University of Technology (SOCIAL SCIENCE EDITION), 2009 (6): 83-86.
[10] Yasheng﹒Aihanjiang. Research on Uyghur text corpus construction technology for sign language information processing [D]. Xinjiang University, 2018.
[11] Jian Xu. Research and implementation of Uyghur speech corpus management platform [D]. Xinjiang University, 2018.
[12] Yibulayin﹒Tuergen, Aibidirexiti﹒Kahaerjiang, Wumaier﹒Aishan, Maihemuti﹒Maimaiti. A review of natural language processing in Central Asian languages [J]. Journal of Chinese Information Processing, 2018,32 (05): 1-13 + 21.
[13] Zhichao Tang. Design and implementation of Uyghur text classifier based on generalized information entropy [D]. Jilin University, 2017.
[14] Kangxi Li, Yong Yang. Linguistic thinking on Parallel Corpus alignment [J]. Journal of Hefei University of Technology (SOCIAL SCIENCE EDITION), 2009 (6): 83-86.
[15] Wulayin﹒Rrheman. Establishment and application of Uyghur phonetic corpus based on online, master's thesis of Xinjiang University, 2017.
[16] Xulan Fei. Construction of Chinese dialects phonetic corpus in Xinjiang. Journal of Xinjiang University (PHILOSOPHY, humanities and Social Sciences), 2008 (7): 16-19.
[17] Xu Jian. Research and implementation of Uyghur speech corpus management platform, master's thesis of Xinjiang University, 2018.
[18] The birth of Emoji, http :// www. vccoo. com, 2017.
[19] java_2017. The difference between text type and string in Hadoop, http://blog.csdn.net,2017.
[20] CSDN.NET. Conversion between ANSIC and Unicode, http://blog.csdn.net,2017.
[21] Chinanews. Guangxi saves ethnic languages and builds a language audio database, http://www.chinanews, 2018.6.
[22] Chengping Wang. Design and sharing of Yi language corpus resource database [J]. Journal of Chinese Information Processing, 2016(1): 129-132.
[23] Chengping Wang, Research on Design and Sharing of Yi Language Corpus Resources Database Based on Syntactic Rules,[J]. Solid State Technology,2020 .(5): 10563-10576

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.