A Teaching Supervision Platform Based on Deep Learning

: The integration of artificial intelligence technology with modern network communication technology in an educational quantification system holds significant importance for enhancing the quality of classroom learning for students. In many vocational school education systems, teachers often act as knowledge transmitters. In traditional classrooms, it is often challenging for teachers to efficiently obtain the learning progress of each student. Due to the structure of the curriculum, students' classroom learning situations typically have to be assessed through a combination of assignments and end-of-term exams. This makes it difficult for teachers to promptly correct students' erroneous learning methods. These issues render many students who are trained through vocational education less adaptable to modernized societal production. This article takes the Shanghai Science and Technology Management School as a typical case and, based on classroom teaching theory, proposes a design and implementation method for an instructional platform that integrates artificial intelligence technology and network communication technology. The system design utilizes artificial intelligence technology for behavior and facial expression-based classroom teaching supervision and combines it with an automated assignment grading system to generate accurate analytical reports on students' classroom learning situations. Research indicates that using this system accurately analyzes students' learning situations during assignment completion, effectively enhances teachers' understanding of students' learning quality, and reduces teachers' burdens in classroom teaching.


Introduction
The swift evolution of artificial intelligence technology has ushered in extensive prospects for enhancing traditional classroom teaching approaches.Traditional classroom teaching necessitates educators to possess a deep grasp of instructional pace and students' learning journeys.Nevertheless, attaining this expertise calls for a substantial accumulation of teaching experience over time.Striking a balance between managing students' in-class learning dynamics and monitoring classroom teaching progress proves demanding, particularly in vocational education settings characterized by larger student cohorts.Conversely, cloud-based deep learning platforms have taken a distinctly alternative developmental trajectory.[1] These platforms permit users to engage in model training and inference on robust cloud servers.They offer an efficient avenue for conducting deep learning inference via network API interfaces, harnessing the power of cloud computing to substantially simplify the intricacies of designing systems reliant on artificial intelligence.[2] With these attributes, the integration of artificial intelligence into conventional classroom monitoring applications becomes a tangible possibility.
Conversely, cloud-based deep learning platforms have taken a distinctly alternative developmental trajectory.These platforms permit users to engage in model training and inference on robust cloud servers.They offer an efficient avenue for conducting deep learning inference via network API interfaces, harnessing the power of cloud computing to substantially simplify the intricacies of designing systems reliant on artificial intelligence.With these attributes, the integration of artificial intelligence into conventional classroom monitoring applications becomes a tangible possibility.[3] The educational platform that relies on the fusion of artificial intelligence and network communication technology serves as a versatile instrument for scrutinizing students' learning quality within the classroom.It furnishes educators and educational institutions with a scientifically objective analytical tool, enabling them to gauge students' learning progress and quantify subjective learning concepts.By conducting an in-depth analysis of platform requisites grounded in the real classroom milieu, a judicious allocation of functionalities and precise designs is undertaken.This encompasses facets like static design, post-processing considerations for artificial intelligence models, database analysis, and the design of visual interfaces.The functions of each module are elucidated to establish a comprehensive blueprint and system design, accompanied by delineated requirements.This educational platform notably streamlines the process for teachers to enhance teaching quality and encapsulate teaching scenarios effectively.The integration of artificial intelligence technology into the classroom environment, it notably heightens the caliber of traditional classroom instruction.
The first section of this article discusses the challenges faced in traditional classroom teaching and provides an overview of the current status of artificial intelligence technology development.The second section delves into the design of an educational quality analysis platform that integrates artificial intelligence technology.Finally, the third section highlights the advantages of the educational quality analysis platform designed in conjunction with artificial intelligence technology.

Related Works
Our work mainly focuses on teaching platform design, module interconnection, student behavior, and action recognition.
We focused on the design of the teaching platform, the security and reliability of multi-module interconnection, and the operational logic of the student behavior and action recognition module.In the following sections, we will discuss the related work in these fields and illustrate their relevance to the approach we propose.

Online education platform
In recent times, the landscape of digital classroom education has undergone a continuous and dynamic evolution, symbolizing a prominent trajectory within the realm of modern educational advancements.A pivotal emphasis has emerged on refining the precision and logical coherence of assessment mechanisms integrated into digital classroom teaching systems, effectively positioning this pursuit as a central imperative.[4] The groundwork for these contemporary endeavors can be traced back to the early 20th century, notably to the pioneering work of the American educator S. I. Pressey.As early as 1925, Pressey embarked on an exploration of program instruction and teaching machines.His visionary insights led to the creation of an ingenious automatic teaching apparatus capable of not only conducting assessments but also instantaneously grading them.This groundbreaking innovation laid the very cornerstone for what would eventually blossom into the realm of Computer-Based Education (CBE).[5] The initial concepts of teaching machines and program instruction sprouted around the 1950s, germinating from Pressey's foundational ideas.Over the subsequent decades, these nascent concepts were meticulously nurtured and refined through unceasing experimentation and dedicated research.The fruits of these labors found resounding success, eventually permeating broader societal consciousness.[6] As the principles and functionalities of teaching machines and program instruction were progressively validated, they garnered widespread acclaim and catalyzed significant shifts in educational methodologies.
The trajectory of these ideas, from Pressey's visionary insights to their contemporary manifestation in modern digital classroom education systems, marks a remarkable evolution.The ongoing journey signifies a potent amalgamation of innovative thought, practical implementation, and a resolute commitment to enhancing educational outcomes.This narrative underlines the profound and enduring impact of visionary pioneers like S. I. Pressey, whose contributions continue to resonate across the educational landscape.[7]

Digital distance education platform
In 2014, Zhang Wenjun introduced the concept of digital new media, highlighting its potential to revolutionize education by seamlessly incorporating distance learning and exceptional curriculum resources, notably video content and more.[8] Building upon this foundation, Guo Xiangyong's proposition aimed to construct a robust teaching resource repository encompassing diverse elements such as material databases, courseware archives, case repositories, exercise compendiums, a cohesive curriculum framework, and an integrated video system.An ingenious feature retrieval table was conceived to harmonize the manifold digital resource databases at the retrieval level.
Within the landscape of campus networking, Li Taifeng accentuated the significance of amalgamating digital resources for collective access.[9] Central to this concept was the formulation of a user-friendly, copyright-conscious resource management system.This system facilitated the input of an array of digital resources, ranging from textual content and images to courseware, videos, and e-books.A notable aspect was the provision of a publishing function for textual and pictorial content, thus empowering users to disseminate their insights.
Wang Zhihua and Yan Yazhen's standpoint brought the user experience to the forefront.Their proposition stressed the indispensability of user engagement and interaction within the system.[10] By creating avenues for users to actively participate and interact with the system, a dynamic and inclusive learning environment was envisaged.

Problem Introduction
Existing online educational platforms primarily emphasize the exchange of information, the distribution and storage of course materials, the provision of course playback, and the facilitation of interactive teaching between educators and students.However, they fall short in their ability to quantify and analyze the learning advancement of a large cohort of students, and they lack the capacity to furnish real-time feedback to instructors regarding students' in-class achievements.Likewise, conventional offline classroom instruction encounters the subsequent predicaments: 1. Educators are required to invest substantial time and attention to monitor students' learning progression, potentially creating a trade-off between instructional quality and student performance assessment.Consequently, precise evaluations of individual students' learning advancement become intricate.Our objective is to alleviate the pedagogical burdens within the classroom environment through the utilization of neural network models and mathematical frameworks.This approach empowers instructors to channel more attention toward the delivery of course content.
Contemporary classroom supervision platforms are intricately intertwined with the realm of deep learning.Our primary approach revolves around harnessing deep learning techniques to capture and analyze the intricate nuances of students' body movements and facial expressions.Furthermore, we employ mathematical modeling techniques to dissect students' performance on classroom assignments.Ultimately, we comprehensively evaluate students' holistic learning trajectories, amalgamating facial expressions, movements, and assignment scores.
The deep learning-driven platform for analyzing the quality of classroom instruction, as proposed within this paper, encompasses three pivotal components: the student interface, the server module, and the instructor interface.Elaborated details regarding the implementation of these three components are provided below: 1) The student interface is centered on data acquisition and upload functionality.Employing highdefinition cameras, this facet captures classroom scenes for each individual student, while their responses to classroom exercises are seamlessly transmitted through embedded terminal devices.
2) The instructor interface empowers educators to either manually retrieve information from the student interface or automatically gather data at predetermined intervals.This versatility ensures a tailored approach to information retrieval.
3) The server module undertakes a comprehensive analysis of students' learning data, employing both deep learning models and mathematical frameworks.Subsequently, it furnishes constructive feedback to the instructor interface.
Regarding visual design considerations, the specific layout of font sizes and spacing in both the image capture interface and the classroom exercise upload interface of the student platform is elaborated upon in Table 1 and visually depicted in Figure 1.In a parallel manner, the constituent components presented within the interface for data analysis and real-time observation, situated at the instructor's end, are meticulously itemized in Table 2 and vividly illustrated in Figure 2.  The Convolutional Neural Network (CNN) model has solidified its position as a venerable archetype in the expansive domain of neural networks.Renowned for its merits, which encompass a judicious allocation of parameters, heightened precision, and real-time capabilities, the CNN model emerges as an especially apt choice for tasks necessitating meticulous classification endeavors.Our pursuit to discern and decode students' facial expressions and motions strategically leverages the prowess of the PP-LCNetV2 model.Functioning as the bedrock of our approach, this model is deftly positioned to capture the nuances of visual information.The PP-LCNetV2 model stands as a beacon of innovation, adeptly amalgamating a reparameterization strategy with intricate convolutional layers.These layers, characterized by their diverse array of convolutional kernel sizes, form a dynamic tapestry that enables the model to glean intricate patterns from the visual data.Moreover, the integration of optimization techniques such as point convolutions and shortcuts further amplifies the model's capabilities.
This meticulously orchestrated optimization process is not a mere cosmetic enhancement.Rather, it plays a pivotal role in bolstering the model's prowess, particularly during the critical inference phase.By harmoniously amalgamating these optimization strategies, the PP-LCNetV2 model emerges as a potent instrument, enabling our approach to meticulously decipher the subtleties embedded within students' facial expressions and motions.
The essence of this foundational network is eloquently captured in the illustrative depiction presented in Figure 3.This schematic rendering visually encapsulates the intricate interplay of reparameterization, convolutional layers, and optimization techniques that converge within the PP-LCNetV2 model, endowing it with the capability to navigate the complexities of visual data and extract meaningful insights.Contemplating the nuances of authentic classroom teaching environments, we hold the conviction that this dataset adeptly encompasses an array of student behavior representations exhibited within the classroom milieu.
The evaluation of students' classroom learning progress is further enriched by the assessment of their scores on classroom exercises.To facilitate this assessment, we present classroom exercise questions to students through terminal devices.As students engage with these exercises, the student interface captures their facial expressions and behaviors.Subsequently, the previously outlined model is employed to scrutinize and analyze the facial expressions and behaviors exhibited by each student.
The outcomes derived from this process are termed "behavior supervision results," while the solutions to the classroom exercise questions are designated as "quality supervision results."Both these aspects are meticulously recorded by the server module for every individual student.These recorded outcomes are subjected to processing, culminating in the generation of what we define as "classroom checkpoint scores."Each student is associated with multiple classroom checkpoint scores, which collectively contribute to their ongoing classroom learning trajectory.At the culmination of the class, these scores are integrated into the overall panorama of students' classroom learning progress, readily accessible through the instructor interface.
After the successful analysis of students' facial expressions and behaviors, we denote the facial expression information as Xa, while behavior information is represented as Xm.The amalgamation of these two sets of information is encapsulated as Hm.Consequently, the formulation of Hm can be articulated as depicted in Equation 1: In line with the insights gleaned from the research conducted by Murtala, a discernible correlation has been established between students' classroom exercise scores and their exhibited behaviors and facial expressions.[11][12] Consequently, we proceed to define the interconnection between students' facial expressions, behaviors, and their classroom exercise scores as denoted by Lm.This intricate relationship finds its representation in Equation 2: (2) Each student's score at each classroom checkpoint is denoted as S.This score, designated as S, finds its mathematical representation in Equation 3: An elevated S value indicates a stronger correlation between the student's score and their performance in classroom behaviors.Conversely, if a student secures a commendable score on classroom exercises but garners a low S value, it implies the potential utilization of unethical means to arrive at the correct exercise answers.Through this quantification process, we effectively establish the intricate interplay between student classroom performance and their respective classroom exercise scores.In the conventional classroom setting, discerning such relationships typically necessitates educators to accumulate years of experience to accurately decipher such nuances.
In the context of a classroom encompassing multiple students, we consolidate the scores of all students at a specific checkpoint, designating this collective score as St. Mathematically, the aggregate score, St, is expressed as illustrated in Equation 4: In Equation 4, it can be observed that the aggregate scores St are always less than n, and the distribution of checkpoint scores for most students is not ideal.Therefore, we utilize Equation 5to reprocess the scores and enhance their interpretation.
We denote the scores of all classroom checkpoints for each student in a class as Stt.Using Equation 5, we map Stt into more comprehensible indicators that evaluate students' classroom learning progress.This approach enables effective detection of students' classroom learning quality, provides real-time feedback to teachers about students' learning status and efficiency, and consequently enhances the overall quality of classroom teaching.

Experiments
In order to apply deep learning methods to the teaching supervision platform, we have designed and experimented with each computational node within the platform.The integration of deep learning and teaching supervision enables more efficient and accurate recognition and analysis of student classroom states.With the assistance of this system, teachers can easily and quickly obtain important information about each student's listening status, learning performance, and other relevant aspects.This gives the platform significant practical value, especially within the context of classrooms and online educational systems.Additionally, the system can help enhance the classroom teaching and management capabilities of young teachers.
During the process of designing the system architecture, considering practical business requirements and drawing from past project development experiences, a standardized layered architecture pattern was chosen based on principles of flexibility and security.While this framework allows for clear implementation of deep learning models, it requires post-processing and integration of model results during execution.
The teaching supervision platform designed in this paper is primarily composed of two parts on the server side.The first part is information collection, mainly achieved through deep learning to recognize student behaviors and facial expressions.The second part is information processing, which utilizes statistical algorithms to analyze and process the information collected in the first part, along with the scores from classroom assignments.
To validate the effectiveness of our proposed methods, we conducted a standard experiment.The experiment was divided into two parts: the first part focused on facial expression classification models, while the second part concentrated on behavior classification models.
In the first part, we evaluated the student facial expression recognition model using the ILSVRC2012 dataset.This dataset comprises 1000 categories of images, each containing 1000 images.We divided the dataset into two parts, with 85% of the images allocated to Dtrain and 15% to Dval.We randomly selected 1000 distinct categories from this dataset.Our primary focus was to evaluate the student facial expression recognition model within this specific setup.
To demonstrate the reliability of the backbone network we selected, we conducted experiments using four different architectures of backbone networks on the ILSVRC2012 dataset.The comparison results for top1 performance can be found in Table 4, and the top5 results are presented in Table 5 of the experiments.Upon meticulous scrutiny of the data encapsulated within both Table 4 and Table 5, a striking pattern emerges.The PP-LCNetV2 backbone network model, standing as a testament to its prowess, boasts a Top1 accuracy of 77.04% and a Top5 accuracy of 93.27% on the ILSVRC2012 dataset.In contrast to its contemporaries, this model stands head and shoulders above the rest, showcasing an unparalleled degree of inference accuracy.
These metrics serve as a testament to the meticulous engineering that underpins the PP-LCNetV2 model.The model's architecture, harnessed with sophistication and precision, has enabled it to not only comprehend the intricacies of the dataset but also make astute categorizations that align remarkably well with ground truth labels.This superior accuracy, evident in both primary and broader classifications, cements the model's standing as an exemplar of excellence within its domain.
The pronounced disparity in accuracy between the PP-LCNetV2 model and its counterparts echoes the culmination of dedicated research, thoughtful design, and meticulous training.It is an affirmation of the model's efficacy in seamlessly navigating the complexities inherent to the ILSVRC2012 dataset.This remarkable achievement reverberates well beyond statistical representation, reflecting the very essence of the model's capacity to decipher, comprehend, and classify a diverse array of visual data.
In essence, the accolades garnered by the PP-LCNetV2 backbone network model reflect a triumph in the realm of inference accuracy.This model transcends the boundaries of its peers, paving the way for more refined and precise outcomes, and asserting its status as an exemplar of unparalleled accuracy in the intricate realm of dataset classification.In practical application scenarios, the model's inference speed is also a crucial metric.We conducted experiments using four different architectures of backbone networks on the ILSVRC2012 dataset.Our experimental setup utilized an Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz computer platform with 64GB of memory, and the inference was performed using the OpenVINO platform.
The prediction speeds for the different models are provided in Table 6.
A perusal of the data encapsulated within Table 6 yields a conspicuous observation: the PP-LCNetV2 backbone network model has achieved a remarkable feat in terms of inference speed.Clocking in at an impressive 4.32 milliseconds per image on the ILSVRC2012 dataset, this figure decisively positions the PP-LCNetV2 model as a frontrunner in the realm of computational efficiency.This accomplishment takes on an even more compelling sheen when juxtaposed against the performance of its contemporaries, as it evidently outpaces them by a considerable margin.
This swift inference speed showcased by the PP-LCNetV2 model is imbued with profound significance.It resolutely addresses the demand for real-time responsiveness that stands as a cornerstone in practical application scenarios.By effortlessly maintaining this rapid pace of inference, the model seamlessly aligns itself with the dynamic pace of real-world applications, where timely insights and decisions are of paramount importance.
The fusion of exceptional inference speed with precision is a hallmark feature of the PP-LCNetV2 backbone network model.This synergy augments its prowess and solidifies its position as an invaluable asset in diverse fields where swift and accurate analyses are prerequisites.In essence, the model's performance is not just a technological achievement; it's a testament to its adaptability and relevance in domains where instantaneous responses are not a luxury, but a necessity.
To demonstrate the reliability of the selected backbone network, we conducted experiments using four different architectures of backbone networks on the SCBD dataset.The comparison results for top1 performance can be found in Table 7, and the top5 results are presented in Table 8 of the experiments.Upon careful examination of the data presented in both Table 7 and Table 8, a conspicuous trend emerges.The PP-LCNetV2 backbone network model stands out by attaining an impressive Top1 accuracy of 92.06%, coupled with an equally noteworthy Top5 accuracy of 98.79%.These remarkable figures are gleaned from the meticulous evaluation of the SCBD dataset.When positioned against other competing models, the PP-LCNetV2 model asserts its superiority by showcasing an unparalleled level of inference accuracy.
Delving deeper into the underlying reasons for this exceptional performance, we postulate a compelling hypothesis.The unique characteristics of the SCBD dataset contribute significantly to this outcome.Notably, the dataset harbors a relatively modest number of distinct categories.However, it stands apart by featuring a substantial quantity of images per individual category.This nuanced distribution aligns seamlessly with the practical application scenarios that the model is designed to cater to.This synchronization between dataset attributes and real-world application dynamics is likely the driving force behind the PP-LCNetV2 backbone network model's exceptional prowess.It's plausible to surmise that the model's architecture is finely tuned to capitalize on the enriched per-category dataset, resulting in heightened accuracy across the board.This effect is particularly pronounced when considering both broader classifications and more specific categorizations.
In conclusion, the classification model forged through the utilization of the PP-LCNetV2 backbone network stands as a testament to exemplary performance.Its accuracy, evident from both general and specific scenario evaluations, underscores its suitability for a wide spectrum of applications.As evidenced by the statistical data and the alignment between dataset intricacies and real-world requirements, it's unequivocally evident that the PP-LCNetV2 model not only excels but thrives in capturing the nuances of diverse categories within a given dataset.

Conclusion
This paper proposes a solution to the problem of insufficient samples in severely distorted image classification.We propose a new WRAN backbone network for feature extraction from severely distorted images using a small sample classification method to provide more learnable feature information.Then we introduce the transduction module to reduce the skewness of the feature central distribution and facilitate classification.
We conducted experiments on the DVF-cls dataset and our proposed method achieved significant improvements compared to existing small sample classification methods.The method requires fewer samples to achieve acceptable accuracy in classifying distorted images and can serve as a practical solution for pre-classifying data in image restoration.However, our proposed method still needs optimization in inference speed, and many issues need to be resolved.

Figure 1 :
Figure 1: Layout of font sizes and spacing in both the image capture interface and the classroom exercise upload interface.

Figure 2 :
Figure 2: The interface for data analysis and real-time observation.

Figure 3 :
Figure 3: The schematic outline of this foundational network.Building upon the foundation of PP-LCNetV2, we have instantiated models for both facial expression classification and classroom pose classification.In our endeavor to discern students' behaviors within the classroom context, we meticulously compiled a dataset spotlighting student classroom behaviors, aptly titled SCBD.This comprehensive dataset aggregates a grand total of 29,732 images, encapsulating the manifestations of 5 distinct categories of classroom behaviors.For a comprehensive overview of image distribution across the training and testing subsets of this dataset,

Figure 4 :
Figure 4: The training and testing subsets of this dataset.

Table 1 :
Common settings of font size and spacing

Table 2 :
Data analysis and real-time observation

Table 3 :
Training and testing subsets of dataset

Table 4 :
Comparison results for top1 performance

Table 5 :
Comparison results for top5 performance

Table 6 :
Prediction speeds for different models

Table 7 :
Comparison results for top1 performance

Table 8 :
Comparison results for top5 performance