Applying Transfer Learning for Syllable-Based Speech Recognition in Tibetan Language
DOI: 10.23977/fcvpr.2023.010101 | Downloads: 42 | Views: 2796
Author(s)
Senyan Li 1, Guanyu Li 1, Sirui Li 1
Affiliation(s)
					1 Northwest Minzu University, Lanzhou, Gansu, 730000, China
				
Corresponding Author
Guanyu LiABSTRACT
This article mainly explores Tibetan speech recognition and reviews its development history. In recent years, end-to-end methods have been applied to Tibetan speech recognition. However, due to the lack of training data, the performance of the end-to-end method is not ideal. Therefore, this article introduces the transfer learning method, which uses Mandarin as a same-language family language to train a pre-trained model that initializes the Tibetan speech recognition model. On the xbmu-amdo31 Tibetan public dataset, our method achieved an 11.8% relative reduction in phoneme error rate compared to the baseline system. This method not only enhances the performance of speech recognition in low-resource languages but also has the potential to be extended to other same-language family languages. Overall, this article highlights the importance of transfer learning in speech recognition and its potential impact on improving speech recognition systems in low-resource languages.
KEYWORDS
Speech recognition, Tibetan language, low-resource, transfer learning, Amdo dialectCITE THIS PAPER
Senyan Li, Guanyu Li, Sirui Li, Applying Transfer Learning for Syllable-Based Speech Recognition in Tibetan Language. Frontiers in Computer Vision and Pattern Recognition (2023) Vol. 1: 1-8. DOI: http://dx.doi.org/10.23977/fcvpr.2023.010101.
REFERENCES
[1] Müller M. Dynamic time warping. Information retrieval for music and motion, 2007: 69-84.
[2] Juang B H, Rabiner L R. Hidden Markov models for speech recognition. Technometrics, 1991, 33 (3): 251-272.
[3] Pei C. Research on Tibetan Speech Recognition Technology Based on Standard Lhasa Tibetan [Doctoral dissertation, Tibet University].2009.
[4] Han Q., & Yu H. Research on Isolated Word Speech Recognition of Ando Tibetan based on HMM. Software Guide, 2010, 9 (7), 173-175.
[5] Zhao E., Wang C., Dang H., et al. Research on Isolated Word Speech Recognition Technology for Tibetan. Journal of Northwest Normal University (Natural Science Edition), 2015, 51 (5), 50-54.
[6] Zhang Y. Research on Lhasa Tibetan Speech Recognition Based on Deep Learning [Doctoral dissertation, Northwest Normal University]. Lanzhou, China. 2016.
[7] Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks// International Conference on Machine Learning. JMLR. org, 2014.
[8] Graves A. Sequence transduction with recurrent neural networks. arXiv preprint arXiv: 1211. 3711, 2012.
[9] Chorowski J, Bahdanau D, Cho K, et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results. Eprint Arxiv, 2014.
[10] Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end attention-based large vocabulary speech recognition//2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016: 4945-4949.
[11] Chan W, Jaitly N, Le Q, et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition//2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016: 4960-4964.
[12] Lu L, Zhang X, Renais S. On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016: 5060-5064.
[13] Gulati A, Qin J, Chiu C C, et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
[14] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.
[15] Zhang B, Lv H, Guo P, et al. Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 6182-6186.
[16] Watanabe S, Hori T, Karita S, et al. Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv: 1804. 00015, 2018.
[17] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412. 6980, 2014.
[18] Watanabe S, Hori T, Kim S, et al. Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE Journal of Selected Topics in Signal Processing, 2017, 11 (8): 1240-1253.
| Downloads: | 81 | 
|---|---|
| Visits: | 5802 | 
Sponsors, Associates, and Links
- 
							Power Systems Computation
   
- 
							Internet of Things (IoT) and Engineering Applications
   
- 
							Computing, Performance and Communication Systems
   
- 
							Journal of Artificial Intelligence Practice
   
- 
							Advances in Computer, Signals and Systems
   
- 
							Journal of Network Computing and Applications
   
- 
							Journal of Web Systems and Applications
   
- 
							Journal of Electrotechnology, Electrical Engineering and Management
   
- 
							Journal of Wireless Sensors and Sensor Networks
   
- 
							Journal of Image Processing Theory and Applications
   
- 
							Mobile Computing and Networking
   
- 
							Vehicle Power and Propulsion
   
- 
							Knowledge Discovery and Data Mining Letters
   
- 
							Big Data Analysis and Cloud Computing
   
- 
							Electrical Insulation and Dielectrics
   
- 
							Crypto and Information Security
   
- 
							Journal of Neural Information Processing
   
- 
							Collaborative and Social Computing
   
- 
							International Journal of Network and Communication Technology
   
- 
							File and Storage Technologies
   
- 
							Frontiers in Genetic and Evolutionary Computation
   
- 
							Optical Network Design and Modeling
   
- 
							Journal of Virtual Reality and Artificial Intelligence
   
- 
							Natural Language Processing and Speech Recognition
   
- 
							Journal of High-Voltage
   
- 
							Programming Languages and Operating Systems
   
- 
							Visual Communications and Image Processing
   
- 
							Journal of Systems Analysis and Integration
   
- 
							Knowledge Representation and Automated Reasoning
   
- 
							Review of Information Display Techniques
   
- 
							Data and Knowledge Engineering
   
- 
							Journal of Database Systems
   
- 
							Journal of Cluster and Grid Computing
   
- 
							Cloud and Service-Oriented Computing
   
- 
							Journal of Networking, Architecture and Storage
   
- 
							Journal of Software Engineering and Metrics
   
- 
							Visualization Techniques
   
- 
							Journal of Parallel and Distributed Processing
   
- 
							Journal of Modeling, Analysis and Simulation
   
- 
							Journal of Privacy, Trust and Security
   
- 
							Journal of Cognitive Informatics and Cognitive Computing
   
- 
							Lecture Notes on Wireless Networks and Communications
   
- 
							International Journal of Computer and Communications Security
   
- 
							Journal of Multimedia Techniques
   
- 
							Automation and Machine Learning
   
- 
							Computational Linguistics Letters
   
- 
							Journal of Computer Architecture and Design 
   
- 
							Journal of Ubiquitous and Future Networks
   

 
	  		 Download as PDF
Download as PDF