Tibetan Lhasa Dialect Speech Synthesis Method Based on End-to-End Model
DOI: 10.23977/jaip.2023.060109 | Downloads: 9 | Views: 447
Author(s)
Zhihao Song 1, Guanyu Li 1, Guangming Li 1
Affiliation(s)
1 Northwest Minzu University, Lanzhou, Gansu, 730000, China
Corresponding Author
Guanyu LiABSTRACT
End-to-end speech synthesis is now the most popular technique for Tibetan speech synthesis. This paper explores the field of Tibetan Lhasa speech synthesis using the Tacotron2 and VITS frameworks, which are based on the end-to-end methodology. To address the problem of inaccurate and incomplete coverage in the phoneme dictionary, a method of synthesizing Tibetan characters is used for Tibetan speech synthesis. Different sequence methods are used for synthesis in the Tacotron2 model, and experimental data indicate that using Tibetan characters in the Tibetan Lhasa speech synthesis has good performance in this model. Last but not least, the Tibetan character synthesis method employs the VITS framework, yielding results that are perfect for speech synthesis. Since Tibetan letters have great application value in Tibetan speech synthesis, the use of them as text input in the fully end-to-end VITS synthesis framework merits further study and promotion.
KEYWORDS
Tibetan Lhasa, end-to-end, speech synthesisCITE THIS PAPER
Zhihao Song, Guanyu Li, Guangming Li, Tibetan Lhasa Dialect Speech Synthesis Method Based on End-to-End Model. Journal of Artificial Intelligence Practice (2023) Vol. 6: 59-65. DOI: http://dx.doi.org/10.23977/jaip.2023.060109.
REFERENCES
[1] Xu Tan et al. "A Survey on Neural Speech Synthesis." arXiv: Audio and Speech Processing (2021): 15-16.
[2] Rangzhuoma C., and C. Zhijie. "Unit Selection Algorism for Corpus-based Tibetan Speech Synthesis." Journal of Chinese Information Processing 31.5 (2017): 59-63.
[3] Zhou Y., and D. C. Zhao. "Research on HMM-based Tibetan speech synthesis." Computer Applications and Software 32.5 (2015): 171-174.
[4] Zhao Yue, et al. "Lhasa-Tibetan speech synthesis using end-to-end model." IEEE Access 7 (2019): 24-30.
[5] Jonathan Shen et al. "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" international conference on acoustics, speech, and signal processing (2017): 56-60.
[6] Jaehyeon Kim et al. "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech" international conference on machine learning (2021): 42-44.
[7] Sergey Ioffe and Christian Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" international conference on machine learning (2015): 57-60.
[8] Jan Chorowski et al. "Attention-Based Models for Speech Recognition" neural information processing systems (2015): 11-15.
[9] Ryan Prenger et al. "Waveglow: A Flow-based Generative Network for Speech Synthesis" international conference on acoustics speech and signal processing (2019): 57-70.
[10] Aaron van den Oord et al. "WaveNet: A Generative Model for Raw Audio" arXiv: Sound (2016): 142.
[11] Jaehyeon Kim et al. "Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search" neural information processing systems (2020): 1-10.
[12] Ashish Vaswani et al. "Attention Is All You Need" (2022): 15-19.
[13] Peter Shaw et al. "Self-Attention with Relative Position Representations" North American chapter of the association for computational linguistics (2018): 67-70.
[14] Laurent Dinh et al. "Density estimation using Real NVP" Learning (2016): 75-77.
[15] Jungil Kong et al. "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis" neural information processing systems (2020): 80-82.
[16] Kundan Kumar et al. "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis" neural information processing systems (2019): 90-94.
[17] Conor Durkan et al. "Neural Spline Flows" neural information processing systems (2019): 95-100.
Downloads: | 6040 |
---|---|
Visits: | 182187 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Advances in Computer, Signals and Systems
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks