Education, Science, Technology, Innovation and Life
Open Access
Sign In

Problems in the Optimization Work of Speech-Text Auto-Recognition and Relevant Possible Solutions

Download as PDF

DOI: 10.23977/jaip.2024.070310 | Downloads: 26 | Views: 913

Author(s)

Xiwen Qin 1

Affiliation(s)

1 University of Shanghai for Science and Technology, Shanghai, 200093, China

Corresponding Author

Xiwen Qin

ABSTRACT

This thesis explores the problems occurred in the annotation work of language audios and possible solutions after analysis and judgement. After Part One Introduction of the industry and Part Two clarification of research methods, Part Three delves into various actual issues encountered in the ASR optimization work and their influence. It utilizes and analyzes real-world investigation data to pinpoint these issues and their impact on the effectiveness of ASR. Part Four examines the solutions of the possible problems proposed, one of which is Cohen's Kappa metrics being successfully applied in an experiment. Part Five is the study of the real application of the methods. This section first explores the generation and optimization problems from a psycho-linguistic perspective before finding out various methods and plans that could enhance the accuracy and efficiency of the annotation process. The goal of this thesis is to provide readers with a comprehensive understanding of both the current situation and further direction of audio annotation. By analyzing current challenges and exploring potential advancements, this thesis is dedicated to provide readers with a thorough understanding of the current state of audio annotation and its future trajectory. It contributes valuable insights that can pave the way for more robust and efficient audio annotation practices, ultimately leading to improved performance in ASR systems.

KEYWORDS

ASR; audio annotation; speech-to-text; psycholinguistics; Artificial Neural Network

CITE THIS PAPER

Xiwen Qin, Problems in the Optimization Work of Speech-Text Auto-Recognition and Relevant Possible Solutions. Journal of Artificial Intelligence Practice (2024) Vol. 7: 83-94. DOI: http://dx.doi.org/10.23977/jaip.2024.070310.

REFERENCES

[1] Text annotation. Papers with Code. (n.d.). https://paperswithcode.com/task/text-annotation 
[2] What is audio annotation, what are the applications and benefits. clickworker.com. (2023, January 16). https://www. clickworker.com/ai-glossary/audio-annotation/ 
[3] Four key metrics for ensuring data annotation accuracy | telus international. (n.d.). https://www. telusinternational.com/insights/ai-data/article/data-annotation-metrics 
[4] AI, S. (2021, December 15). Inter-annotator agreement: An introduction to cohen's kappa statistic. Medium. https://surge-ai.medium.com/inter-annotator-agreement-an-introduction-to-cohens-kappa-statistic-dcc15ffa5ac4 
[5] Ahlsén, E. (2006). Introduction to neurolinguistics. John Benjamins.
[6] Sussex Publishers. (n.d.). How the brain's mirror neurons affect empathy. Psychology Today. https://www. psychologytoday. com/intl/blog/emotional-freedom/202206/how-the-brains-mirror-neurons-affect-empathy
[7] Author links open overlay panelEdmondo Trentin a, a, b, AbstractIn spite of the advances accomplished throughout the last decades, Bridle, J. S., Chen, W. Y., Chung, Y. J., Elman, J. L., Franco, H., Jang, C. S., Bell, A. J., Bengio, Y., Bourlard, H., Cerf, P. L., Chang, P. C., Cosi, P., Cybenko, G., Davis, S. B., Mori, R. D., … Hertz, J. (2001a, February 27). A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing. https://www. sciencedirect. com/science/article/abs/pii/S0925231200003088 
[8] Zacarias-Morales, N., Pancardo, P., Hernández-Nolasco, J. A., & Garcia-Constantino, M. (2021, January 28). Attention-inspired artificial neural networks for Speech Processing: A Systematic Review. MDPI. https://www.mdpi.com/2073-8994/13/2/214

Downloads: 15127
Visits: 485229

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.