Corpus-based English Pronunciation Learning System in English Teaching

: Oral English teaching has become an indispensable part of English education in China. The contradiction between traditional teaching methods and the needs of oral English teaching has become increasingly prominent. The development of corpus technology will help solve such contradictions. The purpose of this paper is to study the design of spoken English pronunciation learning system based on corpus. This paper introduces the main technologies designed by this system, including corpus technology, system development technology Visual Studio 2008 and C# language, and introduces the environment used in the system implementation process. This paper proposes an intelligent oral English pronunciation learning teaching solution for both teachers and students. It combines speech recognition, image retrieval, oral evaluation and other functions with oral English teaching, and uses the system to promote the solution of traditional teaching methods and modern oral English conflict between teaching needs. According to the application of corpus in English teaching, the system is designed, and the participants are organized to conduct oral language evaluation experiments. The reasonable rate of model scoring can reach 98%.


Introduction
In my country's basic education development strategy, English education has been placed in a prominent position.According to the requirements of the new curriculum standard, English has become a compulsory course.English is a very practical course, and many teachers continue to use traditional teaching methods for teaching, that is, teachers explain pronunciation methods and pronunciation demonstrations, and students follow the method of reading teaching [1][2].The drawbacks brought by traditional teaching methods, the limitation of learners' own ability, the uneven distribution of educational resources and the lack of learning environment, the accuracy of students' English pronunciation and the fluency of oral expression are severely limited [3][4].
With the development of technology, a variety of spoken English products have emerged in large numbers [5].Kummin S demonstrates the ways in which combining multimodal materials, students can improve not only their English language skills but also their analytical and creative thinking skills.More importantly, his primary focus is on techniques and tools to improve students' English as a Second Language (ESL) academic performance and learning satisfaction in a classroom setting.More specifically, the study utilizes a variety of multimodal texts as teaching methods, including auditory and visual modalities, including video recordings, YouTube videos, songs, and Adobe speech [6].Salins A examines whether incidental orthography promotes oral vocabulary learning in children with hearing loss and whether the benefits are greater than those of hearing children.Word learning was assessed using behavioral and eye-tracking data from picture naming and picture-word matching tasks.Results showed an orthographic boost to oral vocabulary learning in children with hearing loss, and this benefit was maintained throughout the week [7].Therefore, it is important to study the corpus-based English pronunciation learning system [8].
This paper develops an English accent learning system based on corpus technology.The system has functions such as speech recognition, pronunciation assessment, speech transmission, and oral dialogue, which significantly improves the efficiency of students' oral language learning and allows students to have more time.The advantages of space convenience and rich interactive functions not only help improve learning outcomes, but also enhance learning interest and enthusiasm.

Corpus
Corpus refers to an operating system consisting of a large number of real corpora and corpus retrieval software stored in a computer or server, that is, the combination of corpus and retrieval software, which can retrieve, calculate, analyze and other functions.According to the structure and development of the corpus, it can be roughly divided into two levels, the non-electronic level: the entire corpus is in a non-electronic form, basically text and cards, and the computer development is very immature.Not to mention that on the application, there are no indexing and control tools, all of which rely on simulation requirements.Electronic corpus stage: At this time, computer technology has been applied, and computers have become an integral part of corpus storage, retrieval and indexing [9][10].
In traditional language learning, most cognitive approaches use inference, where teachers must write out rules and then apply them.Data-guided learning emphasizes learning cognitive thinking, and the development of corpus production software is the use of data-driven learning methods [11][12].

Scientific principles
The system design must be scientific, and the design should be based on the teaching content, following the cognitive laws of students, useing multimedia technology to simplify complex knowledge points and give students interest, attract students' attention, stimulate students' curiosity, and give students initiative opportunities to acquire knowledge in teaching [13][14].

Student-centered principle
The design and use of the oral English pronunciation learning system should be student-centered, focusing on students' autonomous learning, enhancing students' ability to apply language, allowing students to gradually master the English language and flexibly use correct teaching methods under the guidance of teachers [15].

Situational principle
Language is a communication tool, and the use of language always occurs in a specific situation, so the creation of the situation must be as real and natural as possible.It is necessary to use multimedia technology to provide a large number of multimedia materials such as audio-visual, video, image, animation, etc., to create language situations related to the learning content for students, so that students can immerse themselves in the situation, language environment and make abstract content concrete [16][17].

Visual Studio 2008
Visual Studio has always been a popular and versatile application development environment under the Windows platform.Visual Studio 2008 adds some features over older versions of VS.NET.For example: Visual Designer.The reason why the .NET Framework can quickly build great, object-oriented applications is that the .NET Framework has a variety of building blocks to solve common system development tasks, such as prefabricated software.The rational construction of closely related applications in the business process of the .NET Framework model facilitates the integration of the system into other complex system development environments [18].

C# Language
As an object-oriented programming language evolved from C and C++, C# is not only safe and stable, because it has the power of C and C++, and does away with complex features such as macros and multiple C++ inheritance.In addition, C# also combines the advantages of simple operation of VB and high performance of C++.With its powerful functions, fluent syntax, innovative language capabilities and convenient support for component-oriented programming, it has become the preferred language for .NET development.

System Environment
Development tools: use the language C# and JSP.The IDE tools used are Visual Studio.Net2008+DirectX SDK and Eclipse.The operating system is mainly Windows, and the Streaming server needs to be installed for the client to access the video.The software required on the client computer includes .netframework 1.1 and IE browser.

Corpus Description
The total size of the corpus is 100,000 words, of which the spoken sub-base and the written sub-base each account for half.In addition, it is equipped with a variety of research tools designed by the research group.Since the corpus of this study is the spoken language database, the written language database will not be introduced.The entire corpus of the spoken language database comes from the National English Major Oral Test (TEM4-Oral) between 2020 and 2022 and the National English Major Oral Test (TEM8-Oral) between 2020 and 2022, with about 1 million words.The spoken language library includes two folders, Zhuan 4 and Zhuan 8.In order to facilitate searching and matching, each folder contains two subfolders corresponding to the structure, namely AUDIO and TEXTS, which store the audio files and the texts transcribed according to the audio files respectively. .Each sub-file further includes the audio files and transcribed texts of each year corresponding to the structure.All the audio files are monophonic, the rate is set to 44, 100Hz, and the bit rate is 16bit.

Integrity
Completeness detects the ratio of the number of words actually pronounced by the user that match the text to the number of words in the text.For words that are rarely read by the user or whose pronunciation accuracy score is very low, they are considered to be missing words relative to the text.Depending on the status corresponding to each phoneme output by the alignment model, the completeness score is the sum of correctly pronounced phonemes (or words) and incorrectly pronounced phonemes (or words) divided by the total number of phonemes (or words).The calculation formula is shown in formula (1). (1)

Fluency
The calculation of fluency is determined based on the duration of each phoneme of a word, and a model is established for the pronunciation duration of each phoneme through the statistics of a large amount of speech data, and the mean value of the duration probability of each phoneme contained in the word is calculated, as shown in formula (2).
where t p denotes the duration of phoneme p, and N denotes the number of phonemes contained in word W.
For the fluency of a sentence, in addition to considering the fluency of each word in the sentence, it is also necessary to consider the pauses in the middle of the words.If there are pauses of more than a certain duration between words, the pronunciation is not considered fluent enough, and the overall fluency score needs to be increased.

System Function Modules
The functional module diagram of the corpus-based spoken English pronunciation learning system is shown in Figure 1: Login: The application can implement functions related to user system login.The functions provided include: new users can log in to the system normally after the user enters the user name and password normally, and provides the function of resetting the password if the user forgets the password.
Speech Recognition: It can realize the function of speech to text.It includes modules such as speech acquisition, speech data preprocessing, speech data feature extraction, codebook generation, model training and model recognition.
Voice evaluation: The core function of this module is that the system will evaluate and score the user's follow-up recording by reading an English word or an English sentence.The system makes a self-made recorder, and uses the volume algorithm to ensure that the voice input by the user can be better recognized and the evaluation results are fed back to the user.
Voice broadcast: The core function of this module is that the user inputs a piece of text, and the system can convert text to speech and broadcast in natural, accurate and fluent voice.
Oral dialogue: The core function of this module is to realize the function of human-computer dialogue.
Figure 1: Functional module diagram of spoken English pronunciation learning system

The Effect of the Spoken Language Evaluation Model
After the evaluation model gives a score for the audio, it is necessary to manually check whether the score is reasonable.The acceptance personnel are all English language experts with English proficiency of CET6 and above.They will evaluate whether they belong to this interval or adjacent interval according to the model score and audio evaluation.If the evaluation of audio is in the range of 41-100, it is in the reasonable range.If the evaluation of audio is in the range of 0-40, it is in the unreasonable range.The scores are divided into 3 intervals, namely 0-40 (poor pronunciation), 41-70 (average pronunciation), and 71-100 (good pronunciation).The pronunciation quality is judged according to completeness and accuracy.
The final rendering of the spoken language evaluation model is shown in Table 1.In the standard American pronunciation evaluation set, the reasonable rate of the model scoring is above 95%, and it can even reach 98% reasonable in the online sentence pronunciation evaluation set nonnative-sen evaluation set.The performance on the word nonnative-word evaluation set is relatively weak, and there is still room for optimization, as shown in Figure 2.

Conclusions
Good English study is not only a means to understand and learn foreign culture and knowledge, but also an essential tool for future work.More and more foreign companies are established in China.However, in everyday English vocabulary, learning to speak is the biggest problem.According to the characteristics of language learning, this paper summarizes the corpus of spoken English pronunciation, and designs and implements a spoken English pronunciation auxiliary learning system that integrates pictures, videos of real people's standard pronunciation, and comparison between real people's pronunciation and standard pronunciation.The system is a software that attaches great importance to learners' English pronunciation, mouth shape, pronunciation foundation and timely detection of learners' pronunciation.The system is stable, practical and effective.

Table 1 :
The effect of the spoken language evaluation model