Mathematical Analysis of the Relationship between College Entrance Exam Scores and Information and Computing Science Discipline Performance in a Chinese University

: This article uses multiple regression analysis methods and canonical correlation analysis, combined with the college entrance examination scores and partial discipline scores in the first three years of university for students majoring in Information and Computing Science in a certain university. We separately analyze the connection between the college entrance examination and university discipline scores, and compare the advantages and disadvantages of the two methods. We hope to quantitatively analyze the transmission and extension of subject knowledge at different stages from a mathematical perspective, and also promote the practical application value of mathematics, providing more people with inspiration and thought on mathematics.


Introduction
Mathematics is the foundation of the entire social development, it is the science of "quantity", and a science that comes from and guides the solution of practical problems [1]. Combining the discipline scores for the first three years of college and the entrance examination scores data for students majoring in Information and Computing Science at a certain private university, we use multiple regression analysis and canonical correlation analysis to analyze the relationship between the two scores [2]. We aim to expand the application capabilities of mathematics and hope to bring more inspiration and thought. is a m-dimensional random variable. There exists a correlation between X and Y , and it is assumed that there is no interaction between them [1,3].    is the regression coefficient, and  is a random variable. It is also assumed that  follows a normal distribution with an expectation of 0 , and  is called a random error. Through the linear regression equation, we can clearly observe the linear influence relationship between X and Y .

Brief Introduction to Canonical Correlation Analysis Theory
Canonical correlation analysis [4], as an important part of multivariate statistics, is a major content of correlation analysis research. The concept is developed based on the correlation of two variables, simplifying the complex correlation relationship between two sets of variables and reflecting the information between the two sets of variables with a few pairs of correlations, while ensuring that these pairs of variables are not related to each other.
Generally speaking, suppose X and Y are p-dimensional and q-dimensional random variables, respectively. At this time, there are p*q correlation coefficients, which are relatively complicated to analyze, and the analysis is difficult to grasp the essence of the matter due to the complex relationships between the components. In order to study the relationship between X and Y, we just need to find a linear combination of the components of X, and at the same time find a linear combination of the components of Y, and make the variables represented by these two linear combinations have the maximum correlation. We call this correlation canonical correlation, and call this pair of new random variables canonical variables; then, we can find a second pair of linear combinations from X and Y, which are uncorrelated with the first pair of linear combinations, and this pair of linear combinations has the maximum correlation, so we get the second pair of canonical variables. We continue this process until the correlation of X and Y is basically extracted, and then analyze the correlation of X and Y based on the canonical variables. This statistical method is called canonical correlation analysis.

Data Selection Description
We selected the college entrance examination subject scores of 23 students in the 2008 class of the Information and Computing Science major at a certain university (both the college entrance examination scores and university subject scores were provided by the academic affairs office of the school). The average score of the college entrance examination subjects is noted as   Based on these simple sample data, we hope to explore some connection between the college entrance examination and university professional discipline scores through the above two methods [5], and also provide a simple comparison and explanation of the two methods.

The Calculation Based on Multiple Regression Analysis
According to the linear regression model between X and Y [6][7][8][9], which is we use SPSS software to calculate and obtain the following results for four specific university disciplines (taking Advanced Algebra, C++ , College English, and Mathematical Modeling as examples): (1)The regression equation and analysis between Advanced Algebra and the scores of the five subjects in the college entrance examination.
The calculated regression equation is: Simultaneously, the determination coefficient 2 0.388 R  , indicating that the linear regression relationship between 1 y and 1 2 3 4 5 , , , , x x x x x is generally significant. This suggests that the linear relationship between Advanced Algebra and the scores of the five subjects in the college entrance examination is generally significant. Further, from the regression equation, we can see that Mathematics and Physics in the college entrance examination subjects have a greater impact on the university variable, namely Advanced Algebra.
(2)The regression equation and analysis between C++ and the scores of the five subjects in the college entrance examination.
The calculated regression equation is: Simultaneously, the determination coefficient 2 0.253 R  , indicating that the linear regression relationship between 6 y and 1 2 3 4 5 , , , , x x x x x is not very good, suggesting that the linear relationship between C++ and the scores of the five subjects in the college entrance examination is not very clear. From the regression equation, we can see that Mathematics, Chemistry, and Physics in the college entrance examination subjects have a relatively weak impact on the study of C++ , which also indicates that students in this major have certain difficulties in learning C++ course.
(3)The regression equation and analysis between College English and the scores of the five subjects in the college entrance examination.
The calculated regression equation is: Simultaneously, the determination coefficient 2 0.175 R  , indicating that the linear regression relationship between 3 y and 1 2 3 4 5 , , , , x x x x x is relatively weak, suggesting that the linear relationship between College English and the scores of the five subjects in the college entrance examination is poor. This also shows that the teaching outline and requirements of College English are no longer suitable for exam-oriented education, and there is a significant gap from high school English teaching. Relatively speaking, from the regression equation, we can see that the English subject in the college entrance examination has a greater impact on College English.
(4)The regression equation and analysis between Mathematical Modeling and the scores of the five subjects in the college entrance examination.
The calculated regression equation is: , , , , has improved compared to the previous equation. From the regression equation, we can see that Mathematics and Physics in the college entrance examination subjects have a greater impact on Mathematical Modeling. This also reflects that the discipline of Mathematical Modeling pays more attention to dealing with practical problems and applying theory to the solution of practical problems. This process is often not easy for students to accept and also reflects certain drawbacks of exam-oriented education.

Conclusion Analysis
Multiple regression analysis only studies the correlation between the scores of specific individual subjects and the scores of each subject in the college entrance examination [10]. Furthermore, the regression coefficient of the equation is relatively small, and the degree of fit is not ideal. This approach cannot fully demonstrate the connection between the subjects of the college entrance examination and the subjects of the information and computing science major. Thus, this method still lacks persuasive power to evaluate the relationship between the college entrance examination and university major subjects.

Correlation Analysis Calculation of College Entrance Examination Subjects and University Subjects Scores
The following information is obtained by processing data through Matlab: Correlation coefficient matrix (Tan et al., 2020) between the scores of college entrance examination subjects: Correlation matrix between the scores of college entrance examination subjects and university subjects:

Canonical Variable Correlation Analysis Calculation of College Entrance Examination Subjects and University Subjects Scores
The eigenvalues of the canonical correlation matrix D are calculated to be :

Canonical Variable Results Analysis of College Entrance Examination Subjects and University Subjects Scores
(1)The analysis of the first pair of canonical variables is as follows: University subject canonical variable equation: The canonical correlation coefficient between 1 w and 1 z is 0.69, indicating a very close relationship between them. Through the coefficient analysis of each equation, we find that computer graphics, mathematical analysis, C++ programming design, and mathematical modeling occupy a larger proportion in the university subject canonical equation. Combining the characteristics of university subjects, we find that: the subject of computer graphics is a combination of mathematics and graphics, and it is easier for students to understand than the relatively dry mathematical analysis; compared with high school mathematics, mathematical analysis is the foundation of university mathematics, and its importance is self-evident; C++ programming is based on mathematics, and mathematical modeling reflects the comprehensive application ability of the subject major; the analysis also reflects the characteristics of this major: based on mathematics, comprehensively and proficiently use mathematical ability to solve practical problems, and improve the application ability of mathematics. On the contrary, the main factors in our college entrance examination subject variable equation are: mathematics, physics, chemistry; among which mathematics occupies the largest proportion, reaching 0.64, the good or bad of mathematics scores can directly affect the learning of related subjects in university; at the same time, the knowledge of elementary physics and chemistry also plays a good understanding and help role in other university subjects like mathematical modeling. The canonical correlation coefficient between 2 w and 2 z is 0.53, which is slightly smaller than the previous coefficient. Continue to analyze the coefficients of each equation, we find that the optimal method in the university subject occupies the largest proportion in the university subject canonical variable equation. Combined with the characteristics of this subject, we know: the main research object of the optimization method is the management problem of various organized systems and their production and operation activities, mainly using mathematical methods to study the optimal path and plan of various systems, providing the basis for scientific decision-making for decision-makers; at the same time, good Chinese scores indicate that students have a deep understanding of literal sentences, with the aid of a certain mathematical foundation, can understand the mathematical background described in the optimal method subject, and then can choose the appropriate method for processing and decision-making. z is 0.24, which is a little smaller than the second pair. Through the coefficient analysis of each equation, we find that discrete mathematics in the university subject occupies the largest proportion in the university subject canonical variable equation. Combined with the characteristics of this subject, we know: the course of discrete mathematics mainly introduces the basic concepts, basic theories and basic methods of various branches of discrete mathematics, and these concepts, theories and methods are widely used in professional courses such as digital circuits and compiler principles; at the same time, the main factors in the college entrance examination subject canonical variable equation are physics and chemistry, good scores in physics and chemistry indicate students' mastery of the principles of physical and chemical phenomena, combined with a certain mathematical logic basis, can understand the mathematical principles described in the subject of discrete mathematics, and then can master this subject.

Analysis
Through canonical correlation analysis of students' college entrance examination scores and university subject scores in the first three years, the connection between the subjects of college entrance examination and university has been deeply analysed [4,11]. The college entrance examination has strongly promoted the in-depth study of relevant professional knowledge in universities. At the same time, according to the professional settings of the subjects of advanced algebra, mathematical analysis, numerical analysis, discrete mathematics, mathematical modeling, etc., they are greatly influenced by the subjects of mathematics, physics, and chemistry in the college entrance examination and occupy a very important position in higher education. Therefore, when reforming the high school curriculum system and implementing the 3+X college entrance examination model, enough class hours should be left for mathematics, physics, and chemistry to ensure the teaching quality and enable high school education to send more college students with solid basic knowledge to higher education institutions.

Conclusion
Through multivariate linear regression analysis of the scores of college entrance examination and university subjects, the coefficient of unknowns in the equation is small, and the effect of regression equation is not very ideal, indicating that this method lacks more favorable support to quantify the connection between the two. However, through the canonical correlation analysis method, establish canonical related variables, determine three pairs of canonical related variables through the size of eigenvalues, and rationally analyze and explain the transmission and extension of subject knowledge at two stages relatively well, and indirectly explain the importance of mathematical method selection. Based on the limited data in this paper, it is necessary to explore more effective mathematical methods to achieve better quantitative support, and also hope to give more people thinking about mathematics.