Analysis of Hot Research Topics on Chinese Data Literacy Based on Bibliometrics

: Data literacy has become an essential basic literacy for individuals living in the digital age. Reviewing published literature on data literacy topics helps clarify the research framework related to data literacy and further optimize subsequent related studies. Through collecting, organizing, and cleansing academic papers published in SCI, EI, CSSCI, CSCD, and Peking University core journals included in CNKI, a bibliometric analysis was conducted to systematically review the research framework of data literacy. Currently, Chinese scholars have made preliminary explorations in the connotation and value research of data literacy, the evaluation of data literacy capabilities, and the path of data literacy cultivation. By utilizing CiteSpace analysis, it was found that "data-driven" is the latest emergent term in research related to data literacy, which may represent the cutting-edge research in the field of data literacy in the current academic community.


Introduction
For a long time, decision-making processes primarily relying on intuition and personal experience have been criticized for their subjectivity and lack of objective basis.With the continuous development of digital infrastructure in various industries, the storage and transmission of various information carriers such as text and images in digital encoded form have become increasingly accessible.The vigorous advancement of information technologies such as big data, breakthroughs in computing and analysis techniques, have enabled the aggregation, analysis, and application of large datasets containing fragmented information from socio-economic, real-world, and managerial decision-making aspects.By interpreting a series of fragmented information, it further strengthens the role of data as a factor of production and a driver of productivity, profoundly transforming traditional human thinking patterns and production lifestyles.Data-driven development has gradually become an important trend in the current social and information transformation.
Accompanied by the profound transformation driven by the new generation of information technology in various fields, the information environment of all industries is undergoing unprecedented changes.The ability to acquire, store, and develop various types of data has become a fundamental skill for professionals in the industry.Data literacy, as a comprehensive ability to reasonably acquire, understand, apply, evaluate, and manage data in the context of big data, is an extension of concepts such as information literacy and statistical literacy.It is an essential literacy for individuals to innovatively apply various types of data in social production, life, and other practical activities within the information environment.
To this end, this study employs bibliometric research methods to conduct statistical analysis, topic mining, and literature comparative analysis on academic publications relevant to the theme of data literacy in prominent Chinese scholarly journals.The objective is to gain insights into the developmental trends of research pertaining to data literacy within the Chinese context.This endeavor aims to provide decision-making references for scholars to capture research dynamics and explore research topics.

Research Methodology
In recent years, the explosive growth of literature resources and continuous improvements in bibliometric research tools have led to an increasing number of scholars delving into the field of bibliometrics.Bibliometrics is a method that combines mathematical and statistical approaches to collect, organize, analyze, evaluate, and predict existing literature data.It enables comprehensive identification from multiple dimensions, including literature topics, scholars, institutions, and journals.
In the application of bibliometric methods, most Chinese scholars focused on the following aspects of research: 1) Using core journals within specific disciplinary fields as data sources, scholars employed bibliometric methods to map the research framework within the discipline, identify current research hotspots, and predict research directions.For example, Ding, X., Wu, Q., and Zhang, P., et al. ( 2022) used bibliometric methods to analyze the research hotspots in the field of management science and engineering based on data from 46 international authoritative journals published over the past decade [1] .2) Scholars utilized bibliometric methods for academic journal evaluation research.Yu, L., & Wang, Z. (2018) proposed an improved composite bibliometric indicator based on the principle of z-scores, replacing citation concentration with the reciprocal of the ratio of low-cited papers, which provides a means of evaluating academic journals [2] .In addition, some scholars had employed bibliometric methods to study the contributions of scholars from different regions in a specific field [3] .

Data Sources
To enhance the quality of literature data in data literacy research and to avoid interference from duplicated literature data across Chinese and international databases, the literature resources were restricted to academic papers indexed in the Chinese Knowledge Resource Integrated Database (CNKI) from source categories including SCI, EI, CSSCI, CSCD, Peking University Core, and AMI.
Based on the characteristics of commonly used methods in literature retrieval such as "subject", "title", "keywords", "abstract", and "full-text" searches, considering the trade-off between the completeness and accuracy of literature data, this study selected "subject" for retrieval.The advanced search function in the Chinese Knowledge Resource Integrated Database (CNKI), a Chinese journal database, was utilized.The search was conducted by setting the query time range up to 2022, with the subject term set as 'data literacy', and the source categories limited to SCI, EI, CSSCI, CSCD, and Peking University Core.This preliminary search yielded a total of 565 literature records.
To enhance the accuracy of research results, prior to conducting data analysis, data cleansing was performed to further improve the precision of literature retrieval.This involved filtering out book reviews, conference reports, project introductions, editorial prefaces, entries without authors, and other irrelevant items.As a result, a total of 502 valid literature articles were obtained.

Annual Growth Trend of Publication Volume
In general, the growth of scientific knowledge is closely related to changes in the publication volume of literature on a particular topic.When assessing research output and hot topics, academia often considers the publication volume of literature on a specific subject as an important indicator.The level of literature publication volume reflects, to some extent, the research popularity and maturity of the subject.
By analyzing and comparing the final literature publication data, it can be observed that literature with the theme of "data literacy" first emerged in 2010.Figure 1 depicts the annual trend of publication volume for literature on the topic of "data literacy".It shows that the publication volume began to increase around 2015 but experienced a relatively stable growth trend between 2016 and 2019.

Analysis of Publication Journal Distribution
According to information management theory, analyzing the distribution of information helps us gain insights into the characteristics and patterns of information concentration and dispersion.Through the research on journal evaluations in China, Hou Jianhua et al. (2015) discovered the existence of the Matthew Effect in terms of journal publications and research directions [4] .After conducting a statistical analysis of the distribution of published literature among journals, it was found that a total of 130 journals have published research papers related to data literacy.Among them, the top 10 journals in terms of publication volume accounted for a total of 206 papers, representing 41.4% of the total.Journals that published more than 10 papers accounted for a total of 251 papers, representing 50% of the total.The specific source journals can be found in Table 1.Further analysis reveals that in the field of research related to "data literacy", approximately 10% of the journals published over 50% of the total number of papers, indicating the concentration of research papers on "data literacy" in a few select journals.These top-ranking journals are predominantly concentrated in the fields of library and information science and educational research.

Analysis of Author Collaboration Networks
The greater the number of publications by the same scholar in the same topic, the deeper their research is considered to be in that field.It also indicates that the scholar may have a leading role in that field and their academic achievements could serve as an important channel for ideas and viewpoints in the research area.When analyzing author collaboration networks using CiteSpace, the g-index was selected for data filtering.In the unsimplified network, there were 269 nodes and 139 connections, which to some extent suggests that the research on data literacy in academia is still relatively scattered.When visualizing the author collaboration network with a threshold set at 3, the author collaboration network is obtained.It can be observed that the author collaboration network is relatively sparse.In terms of publication volume, only scholars such as Yang Xianmin, Huang Ruhua, Hu Hui, and Deng Lijun have a substantial number of publications in the field of "data literacy".Among them, Yang Xianmin focuses on data literacy research in the field of education and teaching, primarily emphasizing teacher data literacy research.Huang Ruhua, Hu Hui, and Deng Lijun, on the other hand, primarily focus on data literacy research in the field of library science.

Analysis of Research Institutions and Collaboration Networks
Similarly, using CiteSpace software and selecting the g-index for data filtering, a research institution collaboration network is obtained with a threshold set at 5. It can be observed that the collaborative relationships among various research institutions in this field in China are not strong, but there are some collaborations.The top five research institutions in terms of publication volume are Wuhan University School of Information Management, Chinese Academy of Sciences Document Information Center, University of Chinese Academy of Sciences, the School of Information Management at Sun Yat-sen University, and Sichuan International Studies University Library.

Keyword-Based Knowledge Graph
Through the analysis of co-occurring keywords in articles related to the theme of "data literacy", a knowledge graph can be constructed.This knowledge graph helps to understand the knowledge clustering and evolutionary patterns in the field of data literacy research.It also provides insights into the co-citation trajectories and evolving networks of research topics, thus facilitating the extraction of current academic research hotspots and clarifying the trends in the evolution of "data literacy" as a subject.

Analysis of Research Hotspots Based on Keyword Co-occurrence Network
The literature keywords provide highly summarized representations of research content.In CiteSpace, the size of nodes indicates the frequency of keyword occurrences, where high-frequency keywords represent current research hotspots in the field.The connections between nodes indicate that the represented keywords appear in the same literature.Without network simplification, CiteSpace computed 323 nodes and 575 connections.
The top ten keywords by frequency are: Data Literacy, Big Data, Information Literacy, Library, Scientific Data, Artificial Intelligence, Data-Driven, Data Journalism, Data Librarian, and Data Service.The frequency and centrality of each keyword are shown in Table 2.

Keyword-based Cluster Analysis
After conducting cluster analysis on the keywords, the obtained Q value for the clustering module was 0.5819, and the average silhouette value (S value) was 0.907, both of which were greater than 0.3 and 0.7, respectively.This indicates that the clustering structure in this study is reasonable and effective.In CiteSpace, by default, clusters with fewer than 10 literature items are not displayed.Therefore, only 10 clusters are shown with the following labels in order: #0 Big Data Research; # After clustering keywords and conducting further analysis of published research papers related to data literacy, the research hotspots can be summarized as follows: (1) Research on the Connotation and Value of Data Literacy Regarding the research on the connotation and value of data literacy, the academic community primarily focuses on investigating teaching and research staff, as well as librarians.Data literacy is a core competency for librarians and researchers and is also a key concept in the field of data management services [5] .For teachers, possessing data literacy enables them to adapt better to dataintensive scientific research models, and it also contributes to promoting research output and other aspects [6] .Additionally, data literacy facilitates the transition of teachers' professional development from a "rough experience" paradigm to an "evidence-based" paradigm [7] .As for students, the dimensions of data literacy vary among different educational stages [8][9] .However, being proficient in using big data to acquire necessary information can help students enhance their knowledge base and lay a solid foundation for their future entry into society [9] .
(2) Research on the Evaluation of Data Literacy Skills As the academic community delves deeper into the research on the connotation of data literacy, there has been an increasing focus on evaluating data literacy skills, particularly among university faculty and students.Data literacy represents a new requirement for individuals in the context of the big data era and is an extension and development of traditional information literacy [10] .Among different groups in universities, such as faculty, doctoral students, master's students, and undergraduate students, significant differences exist in their data literacy abilities [11] .Scholars have undertaken various studies to develop evaluation indicators from different perspectives.For instance, Li Qing and Zhao Huanhuan (2018) used multiple research methods to summarize and analyze relevant literature on the elements of data literacy and constructed a teacher data literacy evaluation indicator system encompassing data knowledge, data skills, teaching applications, and awareness of ethics [12] .On the other hand, Ma Teng and Sun Ling (2019), based on the theory of information ecology, established an evaluation indicator system from dimensions like information, information environment, and information technology to assess students' data literacy [13] .For primary and secondary school students, the development of competence models can include data knowledge and skills, data thinking, data awareness, and adherence to data ethics and norms [14] .In the context of high school mathematics teaching, scholars had designed a data analysis literacy assessment framework based on four dimensions [15] .Upon reviewing the existing research on data literacy evaluation, it is evident that most evaluation indicator systems include dimensions such as data awareness, data processing skills, data communication, and data evaluation. (

3) Research on the Cultivation Path of Data Literacy
To harness the value of data in different fields, it is essential to cultivate data literacy skills among professionals in those respective domains.In comparison to the educational practices at Harvard University, data management services in various Chinese universities remain in a disordered state.In the era of big data, universities should first establish policies for data literacy education and management [16] .For teachers, including those in teacher training programs, the cultivation of data literacy involves a broad knowledge base and is a systematic undertaking that requires collaborative efforts from multiple stakeholders [17] .In the context of primary and secondary school teachers, the application of self-regulated learning theory can be employed to develop a teaching model for data literacy based on self-regulated scaffolding [18] .As for researchers, proposing a data literacy cultivation framework based on the data lifecycle and research project lifecycle can, to some extent, help researchers overcome the challenges posed by insufficient data literacy skills [19] .The burst detection method can be utilized to conduct in-depth analysis of key nodes in a research field, thereby identifying current active or frontier topics.Due to the lack of reference data in the exported dataset from China National Knowledge Infrastructure (CNKI), only keyword burst detection analysis is performed.Keeping the default parameters unchanged, two burst keywords were identified: "Data Journalism" (with a burst strength of 3.3) and "Data-Driven" (with a burst strength of 3.71).The burst time for the keyword "Data Journalism" was from 2014 to 2017, while the burst time for the keyword "Data-Driven" was from 2020 to 2022.The specific source literature for these keywords during the burst time periods is provided in Table 3.

Frontier Analysis Based on Burst Detection Method
Based on the burst detection analysis, it can be inferred that the burst keyword "Data-Driven" mainly originates from journals focused on educational research and library science.Moreover, considering that the burst time for this keyword is from 2020 to 2022, it can be regarded as a current frontier topic in data literacy research.

Research on 'Data-Driven' in the Field of Library and Information Science
Since the beginning of the 21st century, big data has continuously been a hot topic in academic research.In different disciplinary fields, research dimensions related to big data have been gradually subdivided.In the study of "Data-Driven" in the field of library and information science, "data" serves as the development foundation, exerting strong driving forces on the advancement of library and information science [20].However, to implement data-driven approaches as the future development direction for libraries, macro-level efforts are still required.Scholars like He Yali (2020) conducted research on 15 foreign libraries, including the National Library of Medicine in the United States and the British Library.They suggested that during the process of transitioning to data-driven operations, data should serve as the support for library decision-making and management, the driving force for business reconstruction, and the growth point for user services [21] .At the micro-level, the emergence of new scientific research paradigms has led to the emergence of a new generation of librarians characterized by data-driven approaches, relying on big data technology applications, and building services on a foundation of multi-source data.These new librarians mainly focus on providing knowledge services to support scientific decision-making, research management, and the scientific research process [22] .

Research on 'Data-Driven' in the Field of Education and Teaching
In the field of education and teaching, the governance transformation brought about by data-driven approaches plays a crucial role in accelerating the modernization of school governance [23] .Compared to other traditional industries, the innovation in China's higher education sector is insufficient, leading to drawbacks such as standardized and mass-produced education and talent training models.This has resulted in a serious problem of homogenization in talent cultivation among Chinese universities.Therefore, universities should use data as a foundation and follow the value orientation of personalized learning, refined management, and teaching informatization.Exploring the effective methods of using big data to transform the education and teaching methods in universities becomes necessary [24] .Constructing a new relationship between schools, government, and society can facilitate comprehensive and in-depth integration of educational governance and data, enhancing the level of educational governance through the enthusiasm of all parties involved [25] .In recent years, the frequency and scope of the term "precision" have significantly increased in various fields.In the domain of sports training, "precision training" and "data-driven" have become frequent terms and main themes.The advantage of data-driven precision training allows athletes to receive personalized stimuli according to individual differences, leading to optimal adaptation and improved training quality [26] .However, if teachers lack high data literacy, data-driven empowerment in teaching may encounter various challenges, such as the difficulty in implementing tailored teaching and the lack of precision [27] .To effectively implement data-driven teaching, teachers need to clearly recognize that the value of teaching data should always revolve around the students, integrating the cold data into warm teaching practices [28] .

Conclusion
Based on a multi-angle analysis of research literature related to data literacy from the databases of SCI, EI, CSSCI, CSCD, and core journals from Peking University in China National Knowledge Infrastructure (CNKI), the following conclusions have been drawn: Firstly, data literacy research has emerged as a prominent and rapidly developing subject in the fields of library science, information science, and education.There is a notable surge of interest in exploring various aspects of data literacy, including big data analysis, information literacy, and the utilization of scientific data.Moreover, the focus of the investigation is shifting from traditional data management approaches towards more data-driven methodologies.These trends indicate the increasing significance and relevance of data literacy in contemporary scholarly discourse.
Secondly, upon analyzing literature that directly addresses data literacy, it becomes evident that early research primarily concentrated on investigating the connotation and significance of data literacy during the era of big data.These initial studies set the groundwork for further research, prompting an expanding array of inquiries into the components, assessment, and educational aspects of data literacy.As a result, there has been a diversification of research efforts aimed at understanding and enhancing data literacy skills.
Thirdly, among the publicly available research literature, the majority of studies primarily focus on students, teachers, and researchers.In the era of big data and digital economy, data literacy has become an essential skill for promoting human development in the 21st century.
This study has certain limitations.On one hand, in terms of data sources, while the selected research papers are from reputable databases such as SCI, EI, CSSCI, CSCD, and core journals from Peking University, the overall coverage may not be comprehensive enough.On the other hand, foreign research on data literacy was not included in the analysis, thus lacking insights into the hot topics and frontier research themes in foreign studies.

Figure 1 :
Figure 1: Annual Trend of Publication Volume on the Theme of 'Data Literacy'

Table 1 :
Source Journals with Data Literacy Research Publications Exceeding 10 Papers Note: The impact factor data in the table is sourced from the 2022 edition of the Comprehensive Impact Factor in the Chinese Knowledge Resource Integrated Database (CNKI).

Table 3 :
Source Literature for the Keyword "Data-Driven