The First Azeri (Azerbaijani) Language Next Word Predictor
DOI: 10.23977/isspj.2020.51001 | Downloads: 15 | Views: 794
Ali Pourmohammad 1, Mensur Gulami 2, Javid Mahmudov 2, Yusif Aliyev 2, Rovshan Akberov 2
1 Process Automation Engineering Department Baku Higher Oil School Baku, Azerbaijan
2 Department of Computer Science, Khazar University, Baku, Azerbaijan
Corresponding AuthorAli Pourmohammad
Azeri (Azerbaijani) language is one of the more than 50 Turkic languages which it is a little studied language in terms of using the modern signal processing algorithms. This paper tackles the problem of Hidden Markov Models (HMMs) based next word prediction for this language based on Natural Language Processing (NLP) principles using Python high-level programming language. The software is included a small Azeri vocabulary database, the various Python libraries, a HMM model and a Web based interface. In this research, the database was constructed by a predictor parser which it was implemented for the first time for Azeri language. The database was concluded by the most general Azeri language words to introduce HMMs based generated word pairs. The Model was trained by 90% of the database, hence, predicting the next 5 words on the test data resulted 54% accuracy.
KEYWORDSAzeri (Azerbaijani) Language; Next Word Predictor; Hidden Markov Model (HMM), Natural Language Processing (NLP)
CITE THIS PAPER
Ali Pourmohammad, Mensur Gulami, Javid Mahmudov, Yusif Aliyev, Rovshan Akberov, The First Azeri (Azerbaijani) Language Next Word Predictor. Information Systems and Signal Processing Journal (2020) 5: 1-4. DOI: http://dx.doi.org/10.23977/isspj.2020.51001.
 A.A. Kibrik, E.R. Tenishev, E.A. Poceluevskij and I.V. Kormushin, “Languages of the world: Turkic languages,” Jazyki mira: Tjurkskie jazyki, Moscow: Indrik, 1997, pp.542.
 A. Abbasov, R. Fatullayev, A. Fatullayev, “HMM-Based Large Vocabulary Continuous Speech Recognition System For Azerbaijani,” The Third International Conference on Problems of Cybernetics and Informatics, Baku, Azerbaijan, September 6-8, 2010, pp.23-26.
 K.R. Aida-zade, C. Ardil, S.S. Rustamov, “Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems,” IJSP: International Journal of Signal Processing, 2006, V. 3, pp.105-111.
 R. Fatullayev, A. Abbasov, A. Fatullayev, “Dilmanc is the 1st MT system for Azerbaijani,” In: Proc. of SLTC-08, Stockholm, Sweden, 2008, pp.63-64.
 A.M. Sharifova, V.A. Dadalov, I.E. Ibrahimov, “Text Normalization System for Azerbaijan TTS,” In: Proc. of International Symposium on INnovations in Intelligent SysTems and Applications (INISTA 2009), Trabozan, Turkey, 2009, pp.71-74.
 K.R. Aida-zade, S.G. Talibov, “Analysis of the effectiveness of the methods of recognition of authorship of texts in the Azerbaijani language,” In: Proc. of The 5th International Conference on Control and Optimization with Industrial Applications, Baku, Azerbaijan, 27-29 August, 2015, pp.183.
 P.P. Barman, A. Boruah, " A RNN based Approach for next word prediction in Assamese Phonetic Transcription," 8th International Conference on Advances in Computing and Communication (ICACC), Procedia Computer Science, 2018, 143, pp.117–123.
 F.A. Gers, J. Schmidhuber, F. Cummins, “ Learning to forget: Continual prediction with lstm,” 1999.
 J. Luis Garcia Rosa, "Next word prediction in a connectionist distributed representation system," IEEE International Conference on Systems, Man and Cybernetics, Yasmine Hammamet, Tunisia, 2002, pp. 6 pp. vol.3-.
 T. Mikolov, M. Karafiat , L. Burget, J. Cernocky, S. Khudanpur, “Recurrent neural network based language model,” in: Eleventh Annual Conference of the International Speech Communication Association, 2010.
 S. Sukhbaatar, J. Weston, R. Fergus, etal.,”End-to-end memory networks,” in: Advances in neural information processing systems, 2015, pp.2440–2448.
 M. Panzner, P. Cimiano, “Comparing Hidden Markov Models and Long Short Term Memory Neural Networks for Learning Action Representations,” In: Pardalos P., Conca P., Giuffrida G., Nicosia G. (eds) Machine Learning, Optimization, and Big Data. MOD 2016. Lecture Notes in Computer Science, vol 10122. Springer, Cham.
 L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of IEEE, 1989, 77(2), pp.257-286.