Education, Science, Technology, Innovation and Life
Open Access
Sign In

SVM-Based Prediction of Protein Methylation Sites: A Comprehensive Analysis of 553 Properties from the AAindex Database

Download as PDF

DOI: 10.23977/medbm.2024.020218 | Downloads: 15 | Views: 528

Author(s)

Xi Su 1, Mingjun Tang 1, Zipin Zhao 1, Ning Zhang 1

Affiliation(s)

1 Tianjin Key Laboratory of Brain Science and Neuroengineering, Department of Biomedical Engineering, Medical School of Tianjin University, Tianjin, 300072, China

Corresponding Author

Ning Zhang

ABSTRACT

Identifying protein methylation sites experimentally is a challenging and costly task, leading to increased reliance on machine learning-based computational predictors to enhance efficiency. This study aims to improve these predictors through a comprehensive analysis of 553 properties from the AAindex database. We employed support vector machine (SVM) models and utilized 10-fold cross-validation for model evaluation to identify optimal feature combinations for predicting lysine and arginine methylation. The results indicate that the feature set "RACS820104+FUKS010109" yielded the highest performance for lysine methylation, with a Recall (Re) of 71.11%, Precision (Pre) of 75.68%, Accuracy (Acc) of 74.12%, and a Matthews Correlation Coefficient (MCC) of 0.48. For arginine methylation, the feature set "BAEK050101+CHAM810101" achieved a Recall (Re) of 74.60%, Precision (Pre) of 81.08%, Accuracy (Acc) of 78.60%, and an MCC of 0.57. Furthermore, this study explores hydrophobicity as a potentially valuable property for distinguishing methylation from malonylation. This thorough analysis enhances our understanding of the available physicochemical properties, which could lead to the development of more accurate and reliable prediction models.

KEYWORDS

Protein Methylation; Support Vector Machine (SVM); AAindex Database; Feature Selection

CITE THIS PAPER

Xi Su, Mingjun Tang, Zipin Zhao, Ning Zhang, SVM-Based Prediction of Protein Methylation Sites: A Comprehensive Analysis of 553 Properties from the AAindex Database. MEDS Basic Medicine (2024) Vol. 2: 126-135. DOI: http://dx.doi.org/10.23977/medbm.2024.020218.

REFERENCES

[1] Murn, J., Shi, Y. (2017) The winding path of protein methylation research: milestones and new frontiers. Nat Rev Mol Cell Biol 18, 517-527. 
[2] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue):D202-205.
[3] Li ZC, Zhou X, Dai Z, Zou XY. (2011) Identification of protein methylation sites by coupling improved ant colony optimization algorithm and support vector machine. Anal Chim Acta. 703(2):163-171.
[4] Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD. (2016) Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics. 32(20):3107-3115. 
[5] Zhongyan Li, Shangfu Li, Mengqi Luo, Jhih-Hua Jhong, Wenshuo Li, Lantian Yao, Yuxuan Pang, Zhuo Wang, Rulan Wang, Renfei Ma, Jinhan Yu, Hsien-Da Huang and Tzong-Yi Lee. (2022) dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications. Nucleic Acids Research, Volume 50, Issue D1, Pages D471-D479. 
[6] Huang, T., Cui, W., Hu, L., Feng, K., Li, Y. X., & Cai, Y. D. (2009) Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PloS one, 4(12), e8126. 
[7] Cortes, C., Vapnik, V. (1995) Support-Vector Networks. Machine Learning 20, 273-297. 
[8] Pedregosa et al. (2011) Scikit-learn: Machine Learning in Python. JMLR 12, pp. 2825-2830.
[9] Chatterjee, P., Basu, S., Zubek, J., Kundu, M., Nasipuri, M., & Plewczyński, D. (2015) PDP-RF: Protein domain boundary prediction using random forest classifier. In (Ed.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 441–450).

Downloads: 1268
Visits: 53589

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.